2,729 Matching Annotations
  1. Last 7 days
    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents valuable information about the specificity and promiscuity of toxic effector and immunity protein pairs. The evidence supporting the claims of the authors is currently incomplete, as there is concern about the methodology used to analyze protein interactions, which did not take potential differences in expression levels, protein folding, and/or transient interaction into account. Other methods to measure the strength of interactions and structural predictions would improve the study. The work will be of interest to microbiologists and biochemists working with toxin-antitoxin and effector-immunity proteins.

      We thank the reviewers for considering this manuscript. We agree that this manuscript provides a valuable and cross-discipline introduction to new EI pair protein families where we focus on the EI pair’s flexibility and impacts on community structure. As such, we believe we have provided a solid foundation for future studies to examine non-cognate interactions and their possible effects on microbial communities. This, by definition, leaves some areas “incomplete” and, therefore, open for further investigations. While the methods we show do consider potential differences in binding assays, we have more explicitly addressed how “expression, protein folding, and/or transient binding” may play into this expanded EI pair model. We have also tempered the discussion of the proposed model, while also clearly highlighting other published evidence of non-cognate binding interactions between effector and immunity proteins. We have responded to the reviewers’ public comments (italicized below). 

      In this revised manuscript, we have updated the main text, particularly the Discussion section, to include more careful language, explain past research better, and add new references to works showing non-cognate immunity proteins protecting against effectors in other systems. We have also updated the supplemental files with more analyses; the relevant procedures are in the Materials and Methods.

      Public Reviews:

      Note: Reviewer 1, who appeared to focus on a subset of the manuscript rather than the whole, based their comments on several inaccuracies, which we discuss below. We found the tone in this reviewer's comments to be, at times, inappropriate, e.g., using "harsh" and "simply too drastic" to imply that common structure-function analyses were outside of the field-standard methods. We also note that the reviewer took a somewhat atypical step in reviewing this manuscript by running and analyzing the potential protein-complex data in AlphaFold2 but did not discuss areas of low confidence within that model that may contradict their conclusions. We are concerned their approach muddled valid scientific criticisms with problematic conclusions.

      Reviewer #1 (Public Review):

      In this manuscript, Knecht, Sirias et al describe toxin-immunity pair from Proteus mirabilis. Their observations suggest that the immunity protein could protect against non-cognate effectors from the same family. They analyze these proteins by dissecting them into domains and constructing chimeras which leads them to the conclusion that the immunity can be promiscuous and that the binding of immunity is insufficient for protective activity.

      Strengths:<br />  The manuscript is well written and the data are very well presented and could be potentially interesting. The phylogenetic analysis is well done, and provides some general insights.

      Weaknesses:<br /> (1) Conclusions are mostly supported by harsh deletions and double hybrid assays. The later assays might show binding, but this method is not resolutive enough to report the binding strength. Proteins could still bind, but the binding might be weaker, transient, and out-competed by the target binding.

      The phrasing of structure-function analyses as “harsh” is a bit unusual, as other research groups regularly use deletions and hybrid studies. Given the known caveats to deletion and domain substitutions, we included point-mutation analyses for both the effector and immunity proteins, as found on lines 105 - 113 and 255 - 261 in the current manuscript. These caveats are also why we coupled the in vitro binding analyses with in vivo protection experiments in two distinct experimental systems (E. coli and P. mirabilis). Based on this manuscript’s introductory analysis (where we define and characterize the genes, proteins, interactions, phylogenetics, and incidences in human microbiomes), the next apparent questions are beyond the scope of this study. Future approaches would include analyzing purified proteins from the effector (E) and immunity (I) protein families using biochemical assays, such as X-ray crystallography, circular dichroism spectroscopy, among others. 

      Interestingly, most papers in the EI field do not measure EI protein affinity (Jana et al., 2019, Yadav et al., 2021). Notable exceptions are earlier colicin research (Wallis et al., 1995) and a new T6SS EI paper (Bosch et al., 2023) published as we first submitted this manuscript.

      (2) While the authors have modeled the structure of toxin and immunity, the toxin-immunity complex model is missing. Such a model allows alternative, more realistic interpretation of the presented data. Firstly, the immunity protein is predicted to bind contributing to the surface all over the sequence, except the last two alpha helices (very high confidence model, iPTM>0.8). The N terminus described by the authors contributes one of the toxin-binding surfaces, but this is not the sole binding site. Most importantly, other parts of the immunity protein are predicted to interact closer to the active site (D-E-K residues). Thus, based on the AlphaFold model, the predicted mechanism of immunization remains physically blocking the active site. However, removing the N terminal part, which contributes large interaction surface will directly impact the binding strength. Hence, the toxin-immunity co-folding model suggests that proper binding of immunity, contributed by different parts of the protein, is required to stabilize the toxin-immunity complex and to achieve complete neutralization. Alternative mechanisms of neutralization might not be necessary in this case and are difficult to imagine for a DNase.

      In response to the reviewer’s comment, we again reviewed the RdnE-RdnI AlphaFold2 complex predictions with the most updated version of ColabFold (1.5.2-patch with PDB100 and MMseq2) and have included them at the end of these responses [1].

      However, the literature reports that computational predictions of E-I complexes often do not match experimental structural results (Hespanhol et al., 2022, Bosch et al., 2023). As such, we chose not to include the predicted cognate and non-cognate RdnE-I complexes from ColabFold (which uses AlphaFold2) and have not included this data in the revised manuscript. (It is notable that reviewer 1 found the proposed expanded model and research so interesting as to directly input and examine the AI-predicted RdnE-RdnI protein interactions in AlphaFold2.)

      Discussion of the prevailing toxin-immunity complex model is in the introduction (lines 45-48) and Figure 5E. Further, there are various known mechanisms for neutralizing nucleases and other T6SS effectors, which we briefly state in the discussion (lines 359 - 361). More in-depth, these molecular mechanisms include active-site blocking (Benz et al., 2012), allosteric-site binding (Kleanthous et al., 1999 and Lu et al., 2014), enzymatic neutralization of the target (Ting et al., 2021), and structural disruption of both the active and binding sites (Bosch et al., 2023). Given this diversity of mechanisms, we did not presume to speculate on the as-of-yet unknown mechanism of RdnI protection. We have expanded discussion of these items in the revised manuscript.

      (3) Dissection of a toxin into two domains is also not justified from a structural point of view, it is probably based on initial sequence analyses. The N terminus (actually previously reported as Pone domain in ref 21) is actually not a separate domain, but an integral part of the protein that is encased from both sides by the C terminal part. These parts might indeed evolve faster since they are located further from the active site and the central core of the protein. I am happy to see that the chimeric toxins are active, but regarding the conservation and neutralization, I am not surprised, that the central core of the protein fold is highly conserved. However, "deletion 2" is quite irrelevant - it deletes the central core of the protein, which is simply too drastic to draw any conclusions from such a construct - it will not fold into anything similar to an original protein, if it will fold properly at all.

      The reviewer’s comment highlights why we turned to the chimera proteins to dissect the regions of RdnE (formerly IdrD-CT), as the deletions could result in misfolded proteins. (We initially examined RdnE in the years before the launch of AlphaFold2.) However, the reviewer is incorrect regarding the N-terminus of RdnE. The PoNe domain, while also a subfamily of the PD-(D/E)XK superfamily, forms a distinct clade of effectors from the PD-(D/E)XK domain in RdnE (formally IdrD-CT) as seen in Hespanhol et al., 2022; this is true for other DNase effectors as well. Many studies analyzing effectors within the PD-(D/E)XK superfamily only focus on the PD-(D/E)XK domain, removing just this domain from the context of the whole protein (Hespanhol et al., 2022; Jana et al., 2019). Of note, in RdnE, this region alone (containing the DNA-binding domain) is insufficient for DNase activity (unlike in PoNe). We have clarified this distinction in the results section of the current manuscript, visible in figure 2 .

      (4) Regarding the "promiscuity" there is always a limit to how similar proteins are, hence when cross-neutralization is claimed authors should always provide sequence similarities. This similarity could also be further compared in terms of the predicted interaction surface between toxin and immunity.

      Reviewer 1 points out a fundamental property of protein-protein interactions that has been isolated away from the impacts of such interactions on bacterial community structure. We have provided the whole protein alignments in figure 3 supplemental figure 3, the summary images in Figure 3D, and the protein phylogenetic trees in Figure 3C. We encourage others to consider the protein alignments as percent amino acid sequence similarity is not necessarily a good gauge for protein function and interactions. These data are publicly available on the OSF website associated with this manuscript https://osf.io/scb7z/, and we hope the community explores the data there.

      In consideration of the enthusiasm to deeply dive into the primary research data, we have included the pairwise sequence identities across the entire proteins here: Proteus RdnI vs. Rothia RdnI: 23.6%; Proteus RdnI vs. Prevotella RdnI: 16.3%, Proteus RdnI vs. Pseudomonas RdnI: 14.6%; Rothia RdnI vs. Prevotella RdnI: 22.4%, Rothia RdnI vs. Pseudomonas RdnI: 17.6%; Prevotella RdnI vs. Pseudomonas RdnI: 19.5%. (As stated in response to reviewer 1 comment 2, we did not find it appropriate to make inferences based on AlphaFold2-predicted protein complexes.)

      Overall, it looks more like a regular toxin-immunity couple, where some cross-reactions with homologues are possible, depending on how far the sequences have deviated. Nevertheless, taking all of the above into account, these results do not challenge toxin-immunity specificity dogma.

      In this manuscript, we did not intend to dismiss the E-I specificity model but rather point out its limitations and propose an important expansion of that model that accounts for cross-protection and survival against attacks from other genera. We agree that it is commonly considered that deviations in amino acid sequence over time could result in cross-binding and protection (see lines 364-368). However, the impacts of such cross-binding on community structure, bacterial survival, and strain evolution were rarely addressed in prior literature, with exceptions such as in Zhang et al., 2013 and Bosch et al., 2023 among others. One key insight we propose and show in this manuscript is that cross-binding can be a fitness benefit in mixed communities; therefore, it could be selected for evolutionarily (lines 378-380), even potentially in host microbiomes.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Knecht et al entitled "Non-cognate immunity proteins provide broader defenses against interbacterial effectors in microbial communities" aims at characterizing a new type VI secretion system (T6SS) effector immunity pair using genetic and biochemical studies primarily focused on Proteus mirabilis and metagenomic analysis of human-derived data focused on Rothia and Prevotella sequences. The authors provide evidence that RdnE and RdnI of Proteus constitute an E-I pair and that the effector likely degrades nucleic acids. Further, they provide evidence that expression of non-cognate immunity derived from diverse species can provide protection against RdnE intoxication. Overall, this general line of investigation is underdeveloped in the T6SS field and conceptually appropriate for a broad audience journal. The paper is well-written and, aside from a few cases, well-cited. As detailed below however, there are several aspects of this paper where the evidence provided is somewhat insufficient to support the claims. Further, there are now at least two examples in the literature of non-cognate immunity providing protection against intoxication, one of which is not cited here (Bosch et al PMID 37345922 - the other being Ting et al 2018). In general therefore I think that the motivating concept here in this paper of overturning the predominant model of interbacterial effector-immunity cognate interactions is oversold and should be dialed back.

      We agree that analyses focusing on flexible non-cognate interactions and protection are underdeveloped within the T6SS field and are not fully explored within a community structure. These ideas are rapidly growing in the field, as evidenced by the references provided by the reviewer. As stated earlier, we did not intend to overturn the prevailing model but rather have proposed an expanded model that accounts for protection against attacks from foreign genera.

      Strengths:

      One of the major strengths of this paper is the combination of diverse techniques including competition assays, biochemistry, and metagenomics surveys. The metagenomic analysis in particular has great potential for understanding T6SS biology in natural communities. Finally, it is clear that much new biology remains to be discovered in the realm of T6SS effectors and immunity.

      Weaknesses:

      The authors have not formally shown that RdnE is delivered by the T6SS. Is it the case that there are not available genetics tools for gene deletion for the BB2000 strain? If there are genetic tools available, standard assays to demonstrate T6SS-dependency would be to interrogate function via inactivation of the T6SS (e.g. by deleting tssC).

      Our research group showed that the T6SS secretes RdnE (previously IdrD) in Wenren et al., 2013 (cited in lines 71-73). We later confirmed T6SS-dependent secretion by LC-MS/MS (Saak et al., 2017).  

      For swarm cross-phyla competition assays (Figure 4), at what level compared to cognate immunity are the non-cognate immunity proteins being expressed? This is unclear from the methods and Figure 4 legend and should be elaborated upon. Presumably these non-cognate immunity proteins are being overexpressed. Expression level and effector-to-immunity protein stoichiometry likely matters for interpretation of function, both in vitro as well as in relevant settings in nature. It is important to assess if native expression levels of non-cognate cross-phyla immunity (e.g. Rothia and Prevotella) protect similarly as the endogenously produced cognate immunity. This experiment could be performed in several ways, for example by deleting the RdnE-I pair and complementing back the Rothia or Prevotella RdnI at the same chromosomal locus, then performing the swarm assay. Alternatively, if there are inducible expression systems available for Proteus, examination of protection under varying levels of immunity induction could be an alternate way to address this question. Western blot analysis comparing cognate to non-cognate immunity protein levels expressed in Proteus could also be important. If the authors were interested in deriving physical binding constants between E and various cognate and non-cognate I (e.g. through isothermal titration calorimetry) that would be a strong set of data to support the claims made. The co-IP data presented in supplemental Figure 6 are nice but are from E. coli cells overexpressing each protein and do not fully address the question of in vivo (in Proteus) native expression.

      P. mirabilis strain ATCC29906 does not encode the rdnE and rdnI genes on the chromosome (NCBI BioSample: SAMN00001486) (line 151). Production of the RdnI proteins, including the cognate Proteus RdnI, comes from equivalent transgenic expression vectors. Specifically, the rdnI genes were expressed under the flaA promoter in P. mirabilis strain ATCC29906 (Table 1) for the swarm competition assays found in Figure 2C and Figure 4. This promoter results in constitutive expression in swarming cells (Belas et al., 1991; Jansen et al., 2003). In the revised manuscript, figure 4 Supplement Figure 2 shows the relative RdnI protein levels in these strains; we also clarified the expression constructs in the text (see reviewer 3, comment 1).

      Lines 321-324, the authors infer differences between E and I in terms of read recruitment (greater abundance of I) to indicate the presence of orphan immunity genes in metagenomic samples (Figure 5A-D). It seems equally or perhaps more likely that there is substantial sequence divergence in E compared to the reference sequence. In fact, metagenomes analyzed were required only to have "half of the bases on reference E-I sequence receiving coverage". Variation in coverage again could reflect divergent sequence dipping below 90% identity cutoff. I recommend performing metagenomic assemblies on these samples to assess and curate the E-I sequences present in each sample and then recalculating coverage based on the exact inferred sequences from each sample.

      This comment raises the challenges with metagenomic analyses. It was difficult to balance specificity to a particular species’ DNA sequence with the prevalence of any homologous sequence in the sample. Given the distinction in binding interactions among the examined four species, we opted to prioritize specificity, accepting that we were losing access to some rdnE and rdnI sequences in that decision. We chose a 90% identity cutoff, which, through several in silica controls, ensured that each sequence we identified was the rdnE or rdnI gene from that specific species. For the Version of Record, we have included analysis with a 70% cutoff in the supplemental information to try to account for sequence divergence by lowering the identity cutoffs as suggested. The data from the 70% identity cutoff was consistent with the original data from the 90% identity cutoff.

      A description of gene-level read recruitment in the methods section relating to metagenomic analysis is lacking and should be provided.

      Noted. We included the raw code and sequences on the OSF website associated with this manuscript https://osf.io/scb7z/.

      Reviewer #3 (Public Review):

      Summary:<br /> The authors discovered that the RdnE effector possesses DNase activity, and in competition, P. mirabilis having RdnE outcompetes the null strain. Additionally, they presented evidence that the RdnI immunity protein binds to RdnE, suppressing its toxicity. Interestingly, the authors demonstrated that the RdnI homolog from a different phylum (i.e., Actinomycetota) provides cross-species protection against RdnE injected from P. mirabilis, despite the limited identity between the immunity sequences. Finally, using metagenomic data from human-associated microbiomes, the authors provided bioinformatic evidence that the rdnE/rdnI gene pair is widespread and present in individual microbiomes. Overall, the discovery of broad protection by non-cognate immunity is intriguing, although not necessarily surprising in retrospect, considering the prolonged period during which Earth was a microbial battlefield/paradise.

      Strengths:<br /> The authors presented a strong rationale in the manuscript and characterized the molecular mechanism of the RdnE effector both in vitro and in the heterologous expression model. The utilization of the bacterial two-hybrid system, along with the competition assays, to study the protective action of RdnI immunity is informative. Furthermore, the authors conducted bioinformatic analyses throughout the manuscript, examining the primary sequence, predicted structural, and metagenomic levels, which significantly underscore the significance and importance of the EI pair. 

      Weaknesses:<br /> (1) The interaction between RdnI and RdnE appears to be complex and requires further investigation. The manuscript's data does not conclusively explain how RdnI provides a "promiscuous" immunity function, particularly concerning the RdnI mutant/chimera derivatives. The lack of protection observed in these cases might be attributed to other factors, such as a decrease in protein expression levels or misfolding of the proteins. Additionally, the transient nature of the binding interaction could be insufficient to offer effective defenses.

      Yes, we agree with the reviewer and hope that grant reviewers’ share this colleague’s enthusiasm for understanding the detailed molecular mechanisms of RdnE-RdnI binding across genera. In the revised manuscript, we have continued to emphasize such caveats as the next frontier is clearly understanding the molecular mechanisms for RdnI cognate or non-cognate protection. In the revised manuscript, figure 4 Supplement Figure 2 shows the RdnI protein levels; we also clarified the expression constructs in the text (see reviewer 2, comment 2).

      (2) The results from the mixed population competition lack quantitative analysis. The swarm competition assays only yield binary outcomes (Yes or No), limiting the ability to obtain more detailed insights from the data.

      The mixed swam assay is needed when studying T6SS effectors that are primarily secreted during Proteus’ swarming activity (Saak et al. 2017, Zepeda-Rivera et al. 2018). This limitation is one reason we utilize in vitro, in vivo, and bioinformatic analyses. Though the swarm competition assay yields a binary outcome, we are confident that the observed RdnI protection is due to interaction with a trans-cell RdnE via an active T6SS. By contrast, many manuscripts report co-expression of the EI pair (Yadev et al., 2021, Hespanhol et al., 2022) rather than secreted effectors, as we have achieved in this manuscript.

      (3) The discovery of cross-species protection is solely evident in the heterologous expression-competition model. It remains uncertain whether this is an isolated occurrence or a common characteristic of RdnI immunity proteins across various scenarios. Further investigations are necessary to determine the generality of this behavior.

      We agree, which is why we submitted this paper as a launching point for further investigations into the generality of non-cognate interactions and their potential impact on community structure.

      Comments from Reviewing Editor:<br />  - In addition to the references provided by reviewer#2, the first manuscript to show non-cognate binding of immunity proteins was Russell et al 2012 (PMID: 22607806).<br />  - IdrD was shown to form a subfamily of effectors in this manuscript by Hespanhol et al 2022 PMID: 36226828 that analyzed several T6SS effectors belonging to PDDExK, and it should be cited.

      We appreciate that the reviewer and eLife staff pointed out missed citations. We have incorporated these studies and cited them in the revised manuscript.

      [1] The Proteus RdnE in complex with either the Prevotella or Pseudomonas RdnI showed low confidence at the interface (pIDDT ~50-70%); this AI-predicted complex might support the lack of binding seen in the bacterial two-hybrid assay. On the other hand, the Proteus and Rothia RdnI N-terminal regions show higher confidence at the interface with RdnE. Despite this, the C-terminus of the Proteus RdnI shows especially low confidence (pIDDT ~50%) where it might interact near RdnE’s active site (as suggested by reviewer 1). Given this low confidence and the already stated inaccuracies of AI-generated complexes, we would rather wait for crystallization data to inform potential protection mechanisms of RdnI.

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this fundamental study, the authors use innovative fine-scale motion capture technologies to study visual vigilance with high-acuity vision, to estimate the visual fixation of free-feeding pigeons. The authors present convincing evidence for use of the fovea to inspect predator cues, the behavioral state influencing the latency for fovea use, and the use of the fovea decreasing the latency to escape of both the focal individual and other flock members. The work will be of broad interest to behavioral ecologists.

      We thank the editor for his interest and feedback on the manuscript. We hereafter addressed the comments of the reviewer.

      Reviewer #1 (Public Review):

      Summary:

      The authors were using an innovative technic to study the visual vigilance based on high-acuity vision, the fovea. Combining motion-capture features and visual space around the head, the authors were able to estimate the visual fixation of free-feeding pigeon at any moment. Simulating predator attacks on screens, they showed that 1) pigeons used their fovea to inspect predators cues, 2) the behavioural state (feeding or head-up) influenced the latency to use the fovea and 3) the use of the fovea decrease the latency to escape of both the individual that foveate the predators cues but also the other flock members.

      Strengths:

      The paper is very interesting, and combines innovative technic well adapted to study the importance of high-acuity vision for spotting a predator, but also of improving the behavioural response (escaping). The results are strong and the models used are well-adapted. This paper is a major contribution to our understanding of the use of visual adaptation in a foraging context when at risk. This is also a major contribution to the understanding of individual interaction in a flock.

      Weaknesses:

      I have identified only two weaknesses:

      (1) The authors often mixed the methods and the results, Which reduces the readability and fluidity of the manuscript. I would recommend the authors to re-structure the manuscript.<br /> (2) In some parts, the authors stated that they reconstructed the visual field of the pigeon, which is not true. They identified the foveal positions, but not the visual fields, which involve different sectors (binocular, monocular or blind). Similarly, they sometimes mix-up the area centralis and the fovea, which are two different visual adaptations.

      Thank you for your positive feedback. We addressed these comments by restructuring the methods and result sections as suggested, and by checking the terminology and specific vocabulary used throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      First, I would like to say that I really enjoyed the manuscript. This is a great contribution to the field.

      Thank you for the positive feedback, we highly appreciate it.

      Then, I have some comments that I hope, would help the authors to improve the manuscript.

      Major comments :

      I would recommend the authors to restructure the methods and the results section. In many parts, the models used are presented in the results section, while this should be presented in the methods section.

      Thank you for the suggestion, we now have ensured that the model descriptions are presented in the statistic section of the methods.

      To me, the introduction is too long (more than 5 pages). It would be beneficial to reduce it considerably. Furthermore, in the introduction, it misses some information about the visual abilities of your species ((visual acuity, visual field, temporal resolution, contrast sensitivity....).

      We agree that the introduction was very long and reduced it by removing the “Methodological issues” as well as strongly reducing the “Experimental rationales” to a minimum. We also added the missing information on the visual abilities of the pigeons in the “Experimental rationales” section (see L135-150). Please note, however, that we refer to the temporal resolution of pigeon vision in the method section, to associate it with the information of the used monitor’s resolution.

      Minor comments :

      Lines 37-39: This needs a reference.

      A reference has been added (McFarland, 1977)

      Lines 39-41: But see some papers published recently on Harris's hawks.

      Thank you for the references, we added the citation as well as a few more papers (Kane et al., 2015; Kano et al., 2018; Miñano et al., 2023; Yorzinski & Platt, 2014).

      Lines 41-43: This sentence needs a reference as well.

      A reference has been added (Cresswell, 1994; M. H. R. Evans et al., 2018; Inglis & Lazarus, 1981)

      Lines 56-103: In this paragraph, head down and head up also depends from the retinal map of the birds! Some birds have visual streak that allow them to see a potential threats while foraging. Please add more information about the importance of photoreceptors distribution.

      Thank you for pointing out this issue. We rewrote the sentence L65-69 as follows to include the importance retinal structures.

      “In several species, especially those with a broad visual field and specific retinal structures such as the visual streaks, individuals can simultaneously engage in foraging activities while remaining vigilant (Fernández-Juricic, 2012), likely using peripheral vision to detect approaching threats (Bednekoff & Lima, 2005; Cresswell et al., 2003; Kaby & Lind, 2003; Lima & Bednekoff, 1999).”

      Lines 76-79: you wrote : ".... favor alternative hypotheses based on their findings". Which findings? You need to explain.

      We rewrote this part as follows (L80-81).

      “other studies found evidence for the risk dilution (Beauchamp & Ruxton, 2008) and the edge effect (Inglis & Lazarus, 1981) in their study systems.”

      Lines 109-110: It would be good to have a representation of what is an area and a fovea, and how it is placed in the eye, what type of fovea exists and how it is related to visual field. Where does it project?

      We now give a better description of the pigeon’s visual field in the experimental rationales section that we hope will help the reader understanding the key features of pigeon’s vision (see L135-150). Specifically, we now say in L137-138:

      “they have one fovea centrally located in the retina of each eye, with an acuity of 12.6 c/deg (Hodos et al., 1985). Their fovea projects laterally at ~75° into the horizon in their visual field.”

      Lines 109-113: You might need to see some new papers here about the fovea. See for instance Bringmann 2019.

      Thank you for the suggestion, we now give a more precise definition of the fovea and refer to Bringmann’s paper for more details (L113-114):

      “a pit-like area in the retina with high concentration of cone cells where visual acuity is highest, and is responsible for sharp, detailed, and color vision.”

      Lines 113-120: Please explain how the visual field is related to fovea? Where is the fovea project in the visual fields?

      Similarly to the question above, we now give a more precise description of the pigeon’s visual field (see L135-150).

      Line 131-134: For a non-expert, you would need to explain what is micro, meso and macro scale?

      These sentences have been removed when shortening the introduction and we are not referring to micro, meso and macro scales anymore.

      Lines 134-136: Please explain in one sentence the technique here.

      We now explain in one sentence how motion capture enables the tracking of head and body orientation (L130-132):

      “Motion capture cameras track with high accuracy the 3D position of markers, which, when attached to the pigeon’s head and body, enables to reconstruct the rotations of the head and body in all directions.”

      Line 140: You presented here for the first time the word "foveation". Has this term been used before? If so, please add a reference. If not, please explain what you mean by foveation precisely.

      Thank you for noticing this lack. We are now providing the following definition “directing visual focus to the fovea to achieve the clearest vision” in the first place where we mention the term foveation (L149-150).

      Lines 146-148: Please explain why this proves that it is appropriate to not record eyes movements, and is this true for every behaviours?

      We acknowledge that some small eye movement might occur and reduce the accuracy of the method. This error is considered in the system using the +-10 degrees range around the foveas. The lines the reviewer referred to were removed when shortening the introduction, but we added an explanation in the paragraph describing pigeon vision to make it clearer (L147-150):

      “Yet, it should be noted that their eye movement was not tracked in our system, although it is typically confined within a 5 degrees range (Wohlschläger et al., 1993). We thus considered this estimation error of the foveation (directing visual focus to the fovea to achieve the clearest vision) in our analysis, as a part of the error margin (see Methods).”

      Lines 161-163: What is the frontal and binocular field for? You would need to explain the different fields of view and what they are supposed to be for.

      Furthermore, does the visual field of pigeon have been studied? If so, you would need to add more information about it.

      This information is now given in the new paragraph describing the pigeon’s vision in the  “Experimental rationales” section (see L135-150).

      Figure 1: It is not clear here which panels correspond to a, b or c. Please use some boxes to clarify it.

      Thank you for the comment, we now have made the figure’s sub-panels clearer.

      Lines 193-194: You wrote "... such as foveas (also known as the area centralis). No, this is not the same.

      (1) In some species, you have two foveas, one placed centrally in the retina, one place temporally. So the fovea is not the area centralis.

      (2) Second, some species do have an area centralis but without a fovea.

      Thank you for pointing out the inaccuracy. In this case, we were referring specifically to the pigeon’s fovea which is sometimes referred to as “area centralis”, but we now changed the sentence as follow to avoid any confusion (L174-175):

      “The initial two hypotheses (Hypotheses 1 and 2) aim to examine whether foveation correlates with predator detection.”

      Lines 192-212: I did not understand the logic of the hypotheses numbers? Why do you have 2.1 but not 3.1 for instance? And if you have two hypotheses for the within a global one (for instance, 2.1 and 2.2), what is the main hypothesis 2? You should explain more here because we get lost here and in the result section as well.

      We recognize this section might have appeared confusing to the reader. In short, we had four main hypotheses: 1) the fovea is used to evaluate predator cues, 2) the latency to foveate is related to vigilance behaviors. These first 2 hypotheses aimed to determine if the latency to foveate on the predator cue could be related to the detection. 3) foveation is related to the escape response of the pigeons and 4) there is a collective influence in the escape response. We further divided some of the hypotheses into 2 sub-hypotheses whenever 2 different tests were used to answer the same question. We have modified this section to be clearer.

      Lines 224-229: Where are the figures and statistics for these results?

      These results are presented in Table S1. We apologize for forgetting to add this reference and have now added it (L211).

      Lines 229-231: This should be in the method section.

      This model explanation (as well as all other hereafter mentioned) have been moved to the method section as suggested.

      Lines 248-252: This should be in the method section. Furthermore, you should better explain the model selection.

      Please see earlier comment. Additionally, we are now better explaining how the model has been built.

      Figure 2: It is not clear on the figure which letters correspond to which panels. Please improve the readability of the figure.

      It was modified accordingly.

      Lines 274-278: This should be in the method section.

      Please see earlier comment.

      Line 281: The "Fig.3" should be mentioned in the previous sentence.

      It was modified accordingly.

      Figure 3: Please explain why the latency to foveate had negative values in Fig.2 but not here, and not in Fig. 4 as well. This again highlights that we missed a number of information in the methods about the transformation of the data and the model selection.

      The variable presented in Fig 2d is not the latency to foveate but the “Normalized frequency at which the object was observed within foveal regions” (hypothesis 1). It represents the amount of time the object was lying within one of the foveal regions of the individual (“how long the pigeons foveated on it”), further normalized to unit sum to make all objects comparable. This variable was indeed logit-transformed (hence the negative value) to improve residual fit in the model, but this information (as well as other transformations) are always clearly stated on the axis caption of the graphs. Additionally, we now have improved the statistical analysis section to make the model used for each hypothesis testing clearer. But please let us know if you have suggestions for a further improvement in terms of presentation.

      Lines 297-301: This should be in the method section.

      Please see earlier comment.

      Lines 301-305: Fig. 3 b and c only referred to the two first factors. Please add more figures for the other factors. This could be in supp. Mat.

      We added the 3 graphs for the proportion of time foveating on the monitor, the saccade rate and the proportion of time foveating on conspecifics in the supplementary (Fig S6).

      Lines 306-309: This should be in methods, and you should have explained in methods how you performed your model selection.....

      We prefer leaving this paragraph in the result section, as it was intended to give the reader extra information on the predictive power of the different variables (by comparing the effectiveness of the models including one variable at a time, all the rest being equal) and not on the model selection per se. However, we now explain our goal better in the statistics section regarding this analysis (L635-636):

      “We further tested the relative predictive power of the different test variables by comparing the resulting models’ efficiency using AIC scores.”

      Lines 317-319: This should be in the method section.

      Please see earlier comment.

      Lines 320-322: This should be in the method section.

      Please see earlier comment.

      Lines 332-334: This should be in the method section.

      Please see earlier comment.

      Lines 334-336: Then, if this is not significant, you cannot say that.

      Thank you for noticing the inaccuracy, we have now rephrased it as (L298-299):

      “Earlier foveation of the first pigeon was not significantly related to an earlier escape responses among the other flock members, although there was a trend (χ2(1) = 3.66, p = 0.0559).”

      Line 336: Please explain why you did different models. We missed a lot of information in the method about your strategy for statistics.?

      We have now added a lot more information on the models in the statistics section, according to this comment as well as the previous ones. We hope the explanations of the analyses are now clearer to the reader.

      Lines 339-349: This should be in the method section.

      Please see earlier comment.

      Results section: As you may have understood, there are too many sentence that should be moved into the method section. Futhermore, I would recommend to modify the headdings so that they are more biologically speaking. Similarly to what you have done in the discussion section.

      Thank you for the comments. We agree with most of them, and have modified the manuscript accordingly. Additionally, we now use the same headings in the results section as the ones used in the discussion to make the text easier to follow.

      Lines 500-501: What were the body weight of the pigeon? At which weight of their full weight they were?

      This information is now added (492 ± 41g; mean ± SD). We did not control the amount of food during our experiments and only ensured 24h without food by feeding the pigeons after the experiment was completed. This information was added as follows (L454-456):

      “On experimental days, they were fed only after the experiments was completed; this ensures 24-hour no feeding at the time of the experiment, although we did not control the amount of the food over the course of the experimental periods.”

      Line 522-523: Those screens are very good for pigeons.

      Thank you for the positive comment, we indeed tried to match bird vision as close as possible.

      Lines 527-528: At which frequency was produced the moving stimulus? Your screen can display up to 144Hz, which is very good. But can your laptop do it? If not, it is important to mention it as pigeons may have a temporal resolution of vision up to 149Hz.

      Our laptop indeed supports 144Hz display. In addition, we now mention the temporal resolution of pigeon vision (L480-482).

      “We specifically chose a monitor with high temporal resolution to match the pigeon’s Critical Flicker Fusion Frequency (threshold at which a flickering light is perceived by the eye as steady) that reaches up to 143Hz (Dodt & Wirth, 1954).”

      Lines 555-572: Did you use a control shape in your experiment? Indeed, they may escape because of a moving pattern but not a predator shape?

      We did not use a control shape, as the aim of the experiment was not to directly test the effect of the shape itself. We designed the predator cue to resemble an approaching predator to ensure a response from the pigeons, but it might be that other shapes would have worked as well.

      Lines 588-589: Please explain why the coordinate system of the pigeon's head is considered as the visual field?

      From what I have understood, you did not reconstruct the visual fields, but only the position of the fovea. This should be noted like this as visual field involves more than a sphere around the head (binocular and monocular sectors, blind sectors, vertical extension....).

      Thank you for noticing the inaccuracy, we indeed did not consider other sectors of the visual field and therefore rephrased it as (L551): “the location of the objects and conspecifics from the pigeon’s perspective”.

      Lines 601-604: How much does it represent?

      As this was estimated by visual inspection, we do not have the exact percentage of data loss that was caused by grooming. However, because of the number of cameras in the SMART BARN motion capture system, it is reliable in detecting markers inside the space in “ideal” conditions (without occlusion). For example, a similar set-up found marker track loss of only <1% using a model bird (Itahara & Kano 2022)

      Itahara, A., & Kano, F. (2022). “Corvid Tracking Studio”: A custom-built motion capture system to track head movements of corvids. Japanese Journal of Animal Psychology, 72(1), 1–16. https://doi.org/10.2502/janip.72.1.1

      Lines 610-612: You would need to cite Wood 1917 and Hodos et al. 1991 who described the presence of a fovea in this species.

      We added both citations to the manuscript.

      Line 611: Again, the fovea is not egal to area centralis.

      Thank you, we changed it as well.

      Lines 625-626: you wrote "... in a few instances....". Please explain more. How many? What proportion?

      This happened in 9 observations out of 120. We now specify it in the text as well (L587-589):

      “in a few instances (9 out of 120 observations), pigeons foveated on the model predator after the looming stimulus had disappeared, but these cases were excluded from our analysis.”

      Lines 640-653: We missed a lot of information in the section "statistical analysis". If you moved most of the sentence from the results that describe the methods in the method section, that would be much better. Furthermore, you would need to explain more what statistics you used, which model selection, what type of data transformation....

      We agree this section lacked information, and we moved the information from the result to the statistics section.

      Supplmentary materials: boxplots from Fig. S1 and S2 are too small and impossible to read. Please improve the readability.

      We now have enlarged these plots to make them more readable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript "Engineering of PAClight1P78A: A High-Performance Class-B1 GPCR-Based Sensor for PACAP1-38" by Cola et al. presents the development of a novel genetically encoded sensor, PAClight1P78A, based on the human PAC1 receptor. The authors provide a thorough in vitro and in vivo characterization of this sensor, demonstrating its potential utility across various applications in life sciences, including drug development and basic research.

      The diverse methods to validate PAClight1P78A demonstrate a comprehensive approach to sensor engineering by combining biochemical characterization with in vivo studies in rodent brains and zebrafish. This establishes the sensor's biophysical properties (e.g., sensitivity, specificity, kinetics, and spectral properties) and demonstrates its functionality in physiologically relevant settings. Importantly, the inclusion of control sensors and the testing of potential intracellular downstream effects such as G-protein activation underscore a careful consideration of specificity and biological impact.

      Strengths:

      The fundamental development of PAClight1P78A addresses a significant gap in sensors for Class-B1 GPCRs. The iterative design process -starting from PAClight0.1 to the final PAClight1P78A variant - demonstrates compelling optimization. The innovative engineering results in a sensor with a high apparent dynamic range and excellent ligand selectivity, representing a significant advancement in the field. The rigorous in vitro characterization, including dynamic range, ligand specificity, and activation kinetics, provides a critical understanding of the sensor's utility. Including in vivo experiments in mice and zebrafish larvae demonstrates the sensor's applicability in complex biological systems.

      Weaknesses:

      The manuscript shows that the sensor fundamentally works in vivo, albeit in a limited capacity. The titration curves show sensitivity in the nmol range at which endogenous detection might be possible. However, perhaps the sensor is not sensitive enough or there are not any known robust paradigms for PACAP release. A more detailed discussion of the sensors's limitations, particularly regarding in vivo applications and the potential for detecting endogenous PACAP release, would be helpful.

      We thank the reviewer for carefully analyzing our in vivo data and highlighting the limitation of our results regarding the sensor’s applicability in detecting endogenous PACAP. We added several sections conversing future possibilities for optimization in the discussion (see paragraphs 2-4). We agree that a more specific discussion of the limitations of our study is an important addition to help design future experiments. 

      There are several experiments with an n=1 and other low single-digit numbers. I assume that refers to biological replicates such as mice or culture wells, but it is not well defined. n=1 in experimental contexts, particularly in Figure 1, raises significant concerns about the exact dynamic range of the sensor, data reproducibility, and the robustness of conclusions drawn from these experiments. Also, ROI for cell cultures, like in Figure 1, is not well defined. The methods mentioned ROIs were manually selected, which appears very selective, and the values in Figure 1c become unnecessarily questionable. The lack of definition for "ROI" is confusing. Do ROIs refer to cells, specific locations on the cell membrane, or groups of cells? It would be best if the authors could use unbiased methods for image analysis that include the majority of responsive areas or an explanation of why certain ROIs are included or excluded.

      We thank the reviewer for the helpful suggestions. We have increased the number of replicates to n=3 for both HEK293T and neuron data depicted in Fig.1c. Furthermore, we have added Fig.1c’ containing the quantification of the maximum responses obtained in the dataset shown in Fig.1c also depicting the single values for each replicate. To clarify the definition of an ROI in our manuscript, we have detailed the process of ROI selection in the Methods section “Cell culture, imaging and quantification section”. Additionally, we also increased mouse numbers for in vivo PACAP infusions in mice (see Figure 4g).

      Reviewer #2 (Public Review):

      Summary:

      The PAClight1 sensor was developed using an approach successful for the development of other fluorescence-based GPCR sensors, which is the complete replacement of the third intracellular loop of the receptor with a circularly-permuted green fluorescent protein. When expressed in HEK cells, this sensor showed good expression and a weak but measurable response to the extracellular presence of PACAP1-38 (a

      F/Fo of 43%). Additional mutation near the site of insertion of the linearized GPF, at the C-terminus of the receptor, and within the second intracellular loop produced a final optimized sensor with F/Fo of >1000%. Finally, screening of mutational libraries that also included alterations in the extracellular ligand-binding domain of the receptor yielded a molecule, PAClight1P78A, that exhibited a high ligand-dependent fluorescence response combined with a high differential sensitivity to PACAP (EC50 30 nM based on cytometric sorting of stably transfected HEK293 cells) compared to its congener VIP, (with which PACAP shares two highly related receptors, VPAC1 and VPAC2) as well as several unrelated neuropeptides, and significantly slowed activation kinetics by PACAP in the presence of a 10-fold molar excess of the PAC1 antagonist PACAP6-38. A structurally highly similar control construct, PAClight1P78Actl, showed correspondingly similar basal expression in HEK293 cells, but no PACAP-dependent enhancement in fluorescent properties.

      PAClight1P78A was expressed in neurons of the mouse cortex via AAV9.hSyn-mediated gene transduction. Slices taken from PAClight1P78A-transfected cortex, but not slices taken from PAClight1P78Actl-transfected cortex exhibited prompt and persistent elevation of F/Fo after 2 minutes of perfusion with PACAP1-38 which persisted for up to 14 minutes and was statistically significant after perfusion with 3000, but not 300 or 30 nM, of peptide. Likewise, microinfusion of 200 nL of 300 uM PACAP1-38 into the cortex of optical fiber-implanted freely moving mice elicited a F/Fo (%) of greater than 15, and significantly higher than that elicited by application of similar concentrations of VIP, CRF, or enkephalin, or vehicle alone. In vivo experiments were carried out in zebrafish larvae by the introduction of PAClight1P78A into single-cell stage Danio rerio embryos using a Tol2 transposase-based plasmid with a UAS promoter via injection (of plasmid and transposase mRNA), and sorting of post-fertilization embryos using a marker for transgenesis carried in the UAS :

      PAClight1P78A construct. Expression of PAClight1P78A was directed to cells in the olfactory bulb which express the fish paralog of the human PAC1 receptor by using the Tg(GnRH3:gal4ff) line, and fluorescent signals were elicited by intracerebroventricular administration of PACAP1-38 at a single concentration (1 mM), which were specific to PACAP and to the presence of PAClight1P78A per se, as controlled by parallel experiments in which PAClight1P78Actl instead of PAClight1P78A was contained in the transgenic plasmid.

      Major strengths and weaknesses of the methods and results

      The report represents a rigorous demonstration of the elicitation of fluorescent signals upon pharmacological exposure to PACAP in nervous system tissue expressing PAClight1P78A in both mammals (mice) and fish (zebrafish larvae). Figure 4d shows a change in GFP fluorescence activation by PACAP occurring several seconds after the cessation of PACAP perfusion over a two-minute period, and its persistence for several minutes following. One wonders if one is apprehending the graphical presentation of the data incorrectly, or if the activation of fluorescence efficiency by ligand presentation is irreversible in this context, in which case the utility of the probe as a real-time indicator, in vivo, of released peptide might be diminished.

      We thank the reviewer for their careful consideration of our manuscript and agree that the activation of PAClight persisting for several minutes at micromolar concentrations could be a potential limitation for in vivo applications. We added a possible explanation for the persisting sensor activation in response to artificial application of PACAP38 in paragraph 3 of the discussion. We agree that this addition eases the interpretation of PAClight signals detected in vivo. 

      Appraisal of achievement of aims, and data support of conclusions:

      Small cavils with controls are omitted for clarity; the larger issue of appraisal of results based on the scope of the designed experiments is discussed in the section below. An interesting question related to the time dependence of the PACAP-elicited activation of PAClight1P87A is its onset and reversibility, and additional data related to this would be welcome.

      We agree that the reversibility of the sensor’s fluorescence is indeed an important feature especially for detecting endogenous PACAP release. Our data indicate that the sensor’s fluorescence is reversible when detecting small to medium doses of PACAP38 (see Figure 4d – Application of 30-300nM) that are presumably closer to physiological concentrations than the non-reversible concentration of 3000nM. Please, see also our new discussion on peptide concentrations in paragraph 4 of our discussion. For future experiments, it is indeed advisable to adjust the interval of repeated applications to the decay of the response at the respective concentration. Considering, the long-lasting downstream effects of endogenous signaling, longer intervals between ligand applications are generally preferred to match more closely the physiological range in which endogenous PAC1 is most likely affective. 

      Discussion of the impact of the work, and utility of the methods and data:

      Increasingly, neurotransmitter function may be observed in vivo, rather than by inferring in vivo function from in vitro, in cellular, or ex vivo experimentation. This very valuable report discloses the invention of a genetically encoded sensor for the class B1 GPCR PAC1. PAC1 is the major receptor for the neuropeptide PACAP, which in turn is a major neurotransmitter involved in brain response to psychogenic stress, or threat, in vertebrates as diverse as mammals and fishes. If this sensor possesses the sensitivity to detect endogenously released PACAP in vivo it will indeed be an impactful tool for understanding PACAP neurotransmission (and indeed PACAP action in general, in immune and endocrine compartments as well) in future experiments.

      However, the sensor has not yet been used to detect endogenously released PACAP. Until this has been done, one cannot answer the question as to whether the levels of exogenously perfused/administered PACAP used here merely to calibrate the sensor's sensitivity are indeed unphysiologically high. If endogenous PACAP levels don't get that high, then the sensor will not be useful for its intended purpose. The authors should address this issue and allude to what kind of experiments would need to be done in order to detect endogenous PACAP release in living tissue in intact animals. The authors could comment upon the success of other GPCR sensors that have been used to observe endogenous ligand release, and where along the pathway to becoming a truly useful reagent this particular sensor is.

      We thank the reviewer for highlighting the lack in clarity that the scope of this paper was not intended to cover the detection of endogenous PACAP release. We therefore expanded our discussion to encompass the intended purpose of detecting artificially infused or applied PAC1 agonists, such as conducting fundamental tests of drug specificity and developing new pharmacological ligands to selectively target PAC1. This includes a more detailed discussion of our in vivo findings and a clearer phrasing that stresses the potential application for applied drugs and not endogenous PACAP (see last paragraph in the discussion).

      We also agree that little is known about endogenous concentrations of PACAP in the brain. However, we have supplemented our discussion with several references estimating lower concentrations of PACAP and other peptides in vivo, suggesting average PACAP levels below the detection threshold of the sensor. Importantly, within certain brain regions and in closer proximity to release sites, significantly higher concentrations might be reached. Additionally, our data indicate that the concentrations observed under our current conditions do not saturate the sensor in vivo.  

      We therefore acknowledge the reviewer’s comment on the sensor’s potential limitations under our current experimental conditions. Hence, we expanded our discussion and suggest the use of higher resolution imaging to potentially reveal loci of high PACAP concentrations, which should be validated by future studies (see also our added discussion in paragraph 4). 

      Reviewer #3 (Public Review):

      Summary:

      The manuscript introduces PAClight1P78A, a novel genetically encoded sensor designed to facilitate the study of class-B1 G protein-coupled receptors (GPCRs), focusing on the human PAC1 receptor. Addressing the significant challenge of investigating these clinically relevant drug targets, the sensor demonstrates a high dynamic range, excellent ligand selectivity, and rapid activation kinetics. It is validated across a variety of experimental contexts including in vitro, ex vivo, and in vivo models in mice and zebrafish, showcasing its utility for high-throughput screening, basic research, and drug development efforts related to GPCR dynamics and pharmacology.

      Strengths:

      The innovative design of PAClight1P78A successfully bridges a crucial gap in GPCR research by enabling realtime monitoring of receptor activation with high specificity and sensitivity. The extensive validation across multiple models emphasizes the sensor's reliability and versatility, promising significant contributions to both the scientific understanding of GPCR mechanisms and the development of novel therapeutics. Furthermore, by providing the research community with detailed methodologies and access to the necessary viral vectors and plasmids, the authors ensure the sensor's broad applicability and ease of adoption for a wide range of studies focused on GPCR biology and drug targeting.

      Weaknesses

      To further strengthen the manuscript and validate the efficacy of PAClight1P78A as a selective PACAP sensor, it is crucial to demonstrate the sensor's ability to detect endogenous PACAP release in vivo under physiological conditions. While the current data from artificial PACAP application in mouse brain slices and microinfusion in behaving mice provide foundational insights into the sensor's functionality, these approaches predominantly simulate conditions with potentially higher concentrations of PACAP than naturally occurring levels.

      We thank the reviewer for their valuable comments and agree that the use of PAClight for detecting endogenous PACAP will be of big interest for the scientific community and should be a goal for future research. Considering the time, equipment and additional animal licenses necessary, we are convinced that these questions would go beyond the scope of the current paper and might rather be addressed in a follow-up publication. We therefore rephrased the discussion and added more details to clarify further the intended purpose of the current study. Additionally, we added a paragraph in the discussion suggesting experiments needed to validate PAClight for putative future in vivo applications. 

      Although the sensor's specificity for the PAC1 receptor and its primary ligand is a pivotal achievement, exploring its potential application to other GPCRs within the class-B1 family or broader categories could enhance the manuscript's impact, suggesting ways to adapt this technology for a wider array of receptor studies. Additionally, while the sensor's performance is convincingly demonstrated in short-term experiments, insights into its long-term stability and reusability in more prolonged or repeated measures scenarios would be valuable for researchers interested in chronic studies or longitudinal behavioral analyses. Addressing these aspects could broaden the understanding of the sensor's practical utility over extended research timelines.

      We extend our gratitude to the reviewer for diligently assessing our results. 

      Indeed, the very high level of sensitivity that we could achieve in PAClight leads us to think that potentially a grafting-based approach, such as the one we’ve recently described for class-A GPCR-based sensors (PMID: 37474807) could also work for the direct generation of multiple class-B1 sensors based on the optimized fluorescent protein module present in PAClight. Unfortunately, considering the amount of work that testing this hypothesis would entail, we are not able to perform these experiments in the context of this revision, and would rather pursue them as a future project. Nevertheless, we have expanded the discussion of the manuscript with a paragraph with these considerations.

      While we lack comprehensive data on the long-term stability of the sensor, our preliminary findings from photometry recordings optimization indicate consistent baseline expression of PAClight and PACLight ctrl over several weeks. Conducting experiments to systematically assess stability would require several months, which is currently impractical due to limitations in tools and licenses for repeated in vivo infusions. Hence, we intend to include these experiments in potential follow-up studies.

      Furthermore, the current in vivo experiments involving microinfusion of PACAP near sensor-expressing areas in behaving mice are based on a relatively small sample size (n=2), which might limit the generalizability of the findings. Increasing the number of subjects in these experimental groups would enhance the statistical power of the results and provide a more robust assessment of the sensor's in vivo functionality. Expanding the sample size will not only validate the findings but also address potential variability within the population, thereby reinforcing the conclusions drawn from these crucial experiments.

      We agree with the reviewer that a sample size of N=2 is not sufficient for in vivo recordings. We therefore increased the sample size and now present recordings with 5 PAClight1P78A and 4 PACLight-control mice. Of note, the new data validate our previous findings and conclusions and give a better idea of the variability in vivo that we now discuss in much more detail in the discussion (see paragraph 2). 

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      The lower potency of maxadilan activation might reflect broader implications for ligand-receptor dynamics. Perhaps the authors could discuss the maxadilan binding from a structural perspective, including AlphaFold models. Also, discussing how these findings might influence sensor application in diverse biological contexts would be insightful. Clear definitions and consistent use of these terms are crucial for ensuring that readers understand the methods and results.

      We would like to thank the reviewer for the comments. As part of this work, we did not obtain a dose-response curve for maxadilan peptide, and only reported the maximal response of the sensor to a high concentration of the peptide (10 µM). Thus, our findings would rather inform us on the maximal efficacy of the peptide, as opposed to its potency towards the PAC1R. Furthermore, we would like to point out that due to the lack of structural details for any GPCR-based sensor published to date, we cannot make any molecularly accurate conclusion regarding the precise reasons why a different ligand (in this case the sandfly maxadilan) induces a lower maximal efficacy of the response compared to the endogenous cognate ligand of the receptor. We do not believe that AlphaFold models can accurately replace structural information in this regard, especially given the consideration that the aminoacid linker regions between the GPCR and the fluorescent protein, which are a critical determinant of allosteric chromophore modulation by ligand-induced conformational changes, typically obtain the lowest confidence score in all AlphaFold predicted structural models of GPCR-based sensors. Finally, we would like to refer the reviewer to a very nice recent publication (PMID: 32047270) which resolved the structures of each of these peptides bound to the PAC1 receptor-Gs protein complex, which provides accurate molecular details on the different modalities of receptor binding and activation by PACAP138  versus maxadilan.

      Reviewer #2 (Recommendations For The Authors):

      The authors are congratulated on the meticulous achievement of their aim, i.e. a fluorescence-based sensor for the detection of PACAP with in vivo utility. Whether or not this sensor will have the requisite sensitivity to detect the release of endogenous PACAP within various regions of the nervous system, in response to specific environmental stimuli or changes in brain or physiological state, remains to be determined.

      We thank the reviewer for the very positive evaluation of our manuscript and for the suggested additions that will improve the strength of our arguments.

      We agree that the in vivo detection of endogenous PACAP will be an important objective for future studies. Due to time, resource and animal license constraints, we are not able to address this objective in our current study, but we now detail possible future experiments in the discussion section. Please see also our answer to the suggested discussion points previously.

      Reviewer #3 (Recommendations For The Authors):

      To comprehensively assess the sensor's sensitivity and specificity to endogenous PACAP, I recommend conducting additional in vivo experiments where PAClight1P78A is expressed in neurons that endogenously express the Pac1r receptor (using Adcyap1r1-Cre mouse line). These experiments should involve applying sensory or emotional stimuli known to evoke PACAP release or activating upstream PACAP-expressing neurons. Such studies would offer valuable data on the sensor's performance under natural physiological conditions and its potential utility for exploring PACAP's roles in vivo.

      We express our gratitude to the reviewer for providing detailed methodological approaches to examine endogenous PACAP release. These suggestions will prove invaluable for future investigations and are important additions to a follow-up publication. As mentioned earlier, we have incorporated some of these approaches into our discussion. Additionally, we have underscored the existing limitations in detecting endogenous PACAP in vivo and emphasized the relevance of PAClight for drug development purposes.

    1. Author response:

      eLife assessment

      This useful study describes an antibody-free method to map G-quadruplexes (G4s) in vertebrate cells. While the method might have potential, the current analysis is primarily descriptive and does not add substantial new insights beyond existing data (e.g., PMID:34792172). While the datasets provided might constitute a good starting point for future functional studies, additional data and analyses would be needed to fully support the major conclusions and, at the same time, clarify the advantage of this method over other methods. Specifically, the strength of the evidence for DHX9 interfering with the ability of mESCs to differentiate by regulating directly the stability of either G4s or R-loops is still incomplete.

      We thank the editors for their helpful comments.

      Given that antibody-based methods have been reported to leave open the possibility of recognizing partially folded G4s and promoting their folding, we have employed the peroxidase activity of the G4-hemin complex to develop a new method for capturing endogenous G4s that significantly reduces the risk of capturing partially folded G4s. We will be happy to clarify the advantage of our method.

      In the Fig. 7, we applied the Dhx9 CUT&Tag assay to identify the G4s and R-loops directly bound by Dhx9 and further characterized the differential Dhx9-bound G4s and R-loops in the absence of Dhx9. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Furthermore, we showed that depletion of Dhx9 significantly altered the levels of G4s or R-loops around the TSS or gene bodies of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, and also their RNA levels (Fig.7 I). The above evidence is sufficient to support the transcriptional regulation of mESCs cell fate by directly modulating the G4s or R-loops within the key regulators of mESCs.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Non-B DNA structures such as G4s and R-loops have the potential to impact genome stability, gene transcription, and cell differentiation. This study investigates the distribution of G4s and R-loops in human and mouse cells using some interesting technical modifications of existing Tn5-based approaches. This work confirms that the helicase DHX9 could regulate the formation and/or stability of both structures in mouse embryonic stem cells (mESCs). It also provides evidence that the lack of DHX9 in mESCs interferes with their ability to differentiate.

      Strengths:

      HepG4-seq, the new antibody-free strategy to map G4s based on the ability of Hemin to act as a peroxidase when complexed to G4s, is interesting. This study also provides more evidence that the distribution pattern of G4s and R-loops might vary substantially from one cell type to another.

      We appreciate your valuable points.

      Weaknesses:

      This study is essentially descriptive and does not provide conclusive evidence that lack of DHX9 does interfere with the ability of mESCs to differentiate by regulating directly the stability of either G4 or R-loops. In the end, it does not substantially improve our understanding of DHX9's mode of action.

      In this study, we aimed to report new methods for capturing endogenous G4s and R-loops in living cells. Dhx9 has been reported to directly unwind R-loops and G4s or promote R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). To understand the direct Dhx9-bound G4s and R-loops, we performed the Dhx9 CUT&Tag assay and analyzed the co-localization of Dhx9-binding sites and G4s or R-loops. We found that 47,857 co-localized G4s and R-loops are directly bound by Dhx9 in the wild-type mESCs and 4,060 of them display significantly differential signals in absence of Dhx9, suggesting that redundant regulators exist as well. We showed that depletion of Dhx9 significantly altered the RNA levels of several key regulators of mESC and embryonic development, such as Nanog, Lin28a, Bmp4, Wnt8a, Gata2, and Lef1, which coincides with the significantly differential levels of G4s or R-loops around the TSS or gene bodies of these genes (Fig.7). The comprehensive molecular mechanism of Dhx9 action is indeed not the focus of this study. We will work on it in the future studies. Thank you for the comments.

      There is no in-depth comparison of the newly generated data with existing datasets and no rigorous control was presented to test the specificity of the hemin-G4 interaction (a lot of the hemin-dependent signal seems to occur in the cytoplasm, which is unexpected).

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity. To identify the specific signals, we have included the non-label control and used this control to call confident HepG4 peaks in all HepG4-seq assays.

      The hemin-RNA G4 complex has also been reported to have mimic peroxidase activity and trigger similar self-biotinylation signals as DNA G4s (PMID: 32329781, 31257395, 27422869). Therefore, it is not surprising to observe hemin-dependent signals in the cytoplasm generated by cytoplasmic RNA G4s.

      In the revised version, we will include careful comparison between our data and previous datasets.

      The authors talk about co-occurrence between G4 and R-loops but their data does not actually demonstrate co-occurrence in time. If the same loci could form alternatively either R-loops or G4 and if DHX9 was somehow involved in determining the balance between G4s and R-loops, the authors would probably obtain the same distribution pattern. To manipulate R-loop levels in vivo and test how this affects HEPG4-seq signals would have been helpful.

      Single-molecule fluorescence studies have shown the existence of a positive feedback mechanism of G4 and R-loop formation during transcription (PMID: 32810236, 32636376), suggesting that G4s and Rloops could co-localize at the same molecule. Dhx9 is a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). Although depletion of Dhx9 resulted in 6,171 Dhx9-bound co-localized G4s and R-loops with significantly altered levels of G4s or R-loops, only 276 of them (~4.5%) harbored altered G4s and R-loops, suggesting that the interacting G4s and R-loops are rare in living cells. Nowadays, the genome-wide co-occurrence of two factors are mainly obtained by bioinformatically intersection analysis. We agreed that the heterogenous distribution between cells will give false positive co-occurrence patterns. We will carefully discuss this point in the revised version. At the same time, we will make efforts to develop a new method to map the co-localized G4 and R-loop in the same molecule in the future study.

      This study relies exclusively on Tn5-based mapping strategies. This is a problem as global changes in DNA accessibility might strongly skew the results. It is unclear at this stage whether the lack of DHX9, BLM, or WRN has an impact on DNA accessibility, which might underlie the differences that were observed. Moreover, Tn5 cleaves DNA at a nearby accessible site, which might be at an unknown distance away from the site of interest. The spatial accuracy of Tn5-based methods is therefore debatable, which is a problem when trying to demonstrate spatial co-occurrence. Alternative mapping methods would have been helpful.

      In this study, we used the recombinant streptavidin monomer and anti-GP41 nanobody fusion protein (mSA-scFv) to specifically recognize hemin-G4-induced biotinylated G4 and then recruit the recombinant GP41-tagged Tn5 protein to these G4s sites. Similarly, the recombinant V5-tagged N-terminal hybrid-binding domain (HBD) of RNase H1 specifically recognizes R-loops and recruit the recombinant protein G-Tn5 (pG-Tn5) with the help of anti-V5 antibody. Therefore, the spatial distance of Tn5 to the target sites is well controlled and very short, and also the recruitment of Tn5 is specifically determined by the existence of G4s in HepG4-seq and R-loops in HBD-seq.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Liu et al. explore the interplay between G-quadruplexes (G4s) and R-loops. The authors developed novel techniques, HepG4-seq and HBD-seq, to capture and map these nucleic acid structures genome-wide in human HEK293 cells and mouse embryonic stem cells (mESCs). They identified dynamic, cell-type-specific distributions of co-localized G4s and R-loops, which predominantly localize at active promoters and enhancers of transcriptionally active genes. Furthermore, they assessed the role of helicase Dhx9 in regulating these structures and their impact on gene expression and cellular functions.

      The manuscript provides a detailed catalogue of the genome-wide distribution of G4s and R-loops. However, the conceptual advance and the physiological relevance of the findings are not obvious. Overall, the impact of the work on the field is limited to the utility of the presented methods and datasets.

      Strengths:

      (1) The development and optimization of HepG4-seq and HBD-seq offer novel methods to map native G4s and R-loops.

      (2) The study provides extensive data on the distribution of G4s and R-loops, highlighting their co-localization in human and mouse cells.

      (3) The study consolidates the role of Dhx9 in modulating these structures and explores its impact on mESC self-renewal and differentiation.

      We appreciate your valuable points.

      Weaknesses:

      (1) The specificity of the biotinylation process and potential off-target effects are not addressed. The authors should provide more data to validate the specificity of the G4-hemin.

      The specificity of hemin-G4-induced peroxidase activity and self-biotinylation has been well demonstrated in previous studies (PMID: 19618960, 22106035, 28973477, 32329781). In the Fig.1A, we compared the hemin-G4-induced biotinylation levels in different conditions. Cells treated with hemin and Bio-An exhibited a robust fluorescence signal, while the absence of either hemin or Bio-An almost completely abolished the biotinylation signals, suggesting a specific and active biotinylation activity.

      (2) Other methods exploring a catalytic dead RNAseH or the HBD to pull down R-loops have been described before. The superior quality of the presented methods in comparison to existing ones is not established. A clear comparison with other methods (BG4 CUT&Tag-seq, DRIP-seq, R-CHIP, etc) should be provided.

      Thank you for the suggestions. We will include the comparisons in the revised version.

      (3) Although the study demonstrates Dhx9's role in regulating co-localized G4s and R-loops, additional functional experiments (e.g., rescue experiments) are needed to confirm these findings.

      Dhx9 has been demonstrate as a versatile helicase capable of directly resolving R-loops and G4s or promoting R-loop formation in previous studies (PMID: 21561811, 30341290, 29742442, 32541651, 35905379, 34316718). We believe that the current new dataset and previous studies are enough to support the capability of Dhx9 in regulating co-localized G4s and R-loops.

      (4) The manuscript would benefit from a more detailed discussion of the broader implications of co-localized G4s and R-loops.

      Thank you for the suggestions. We will include a more detailed discussion in the revised version.

      (5) The manuscript lacks appropriate statistical analyses to support the major conclusions.

      We apologized for this point. Whereas we have applied careful statistical analyses in this study, lacking of some statistical details make people hard to understand some conclusions. We will carefully add details of all statistical analysis.

      (6) The discussion could be expanded to address potential limitations and alternative explanations for the results.

      Thank you for the suggestions. We will include a more detailed discussion about this point in the revised version.

      Reviewer #3 (Public Review):

      Summary:

      The authors developed and optimized the methods for detecting G4s and R-loops independent of BG4 and S9.6 antibody, and mapped genomic native G4s and R-loops by HepG4-seq and HBD-seq, revealing that co-localized G4s and R-loops participate in regulating transcription and affecting the self-renewal and differentiation capabilities of mESCs.

      Strengths:

      By utilizing the peroxidase activity of G4-hemin complex and combining proximity labeling technology, the authors developed HepG4-seq (high throughput sequencing of hemin-induced proximal labelled G4s), which can detect the dynamics of G4s in vivo. Meanwhile, the "GST-His6-2xHBD"-mediated CUT&Tag protocol (Wang et al., 2021) was optimized by replacing fusion protein and tag, the optimized HBD-seq avoids the generation of GST fusion protein aggregates and can reflect the genome-wide distribution of R-loops in vivo.

      The authors employed HepG4-seq and HBD-seq to establish comprehensive maps of native co-localized G4s and R-loops in human HEK293 cells and mouse embryonic stem cells (mESCs). The data indicate that co-localized G4s and R-loops are dynamically altered in a cell type-dependent manner and are largely localized at active promoters and enhancers of transcriptionally active genes.

      Combined with Dhx9 ChIP-seq and co-localized G4s and R-loops data in wild-type and dhx9KO mESCs, the authors confirm that the helicase Dhx9 is a direct and major regulator that regulates the formation and resolution of co-localized G4s and R-loops.

      Depletion of Dhx9 impaired the self-renewal and differentiation capacities of mESCs by altering the transcription of co-localized G4s and R-loops-associated genes.

      In conclusion, the authors provide an approach to studying the interplay between G4s and R-loops, shedding light on the important roles of co-localized G4s and R-loops in development and disease by regulating the transcription of related genes.

      We appreciate your valuable points.

      Weaknesses:

      As we know, there are at least two structure data of S9.6 antibody very recently, and the questions about the specificity of the S9.6 antibody on RNA:DNA hybrids should be finished. The authors referred to (Hartono et al., 2018; Konig et al., 2017; Phillips et al., 2013) need to be updated, and the authors' bias against S9.6 antibodies needs also to be changed. However, as the authors had questioned the specificity of the S9.6 antibody, they should compare it in parallel with the data they have and the data generated by the widely used S9.6 antibody.

      Thank you for the updating information about the structure data of S9.6 antibody. We politely disagree the specificity of the S9.6 antibody on RNA:DNA hybrids. The structural studies of S9.6 (PMID: 35347133, 35550870) used only one RNA:DNA hybrid to show the superior specificity of S9.6 on RNA:DNA hybrid than dsRNA and dsDNA. However, Fabian K. et al has reported that the binding affinities of S9.6 on RNA:DNA hybrid exhibits obvious sequence-dependent bias from null to nanomolar range (PMID: 28594954). We will include the comparison between S9.6-derived data and our HBD-seq data in the revised version.

      Although HepG4-seq is an effective G4s detection technique, and the authors have also verified its reliability to some extent, given the strong link between ROS homeostasis and G4s formation, and hemin's affinity for different types of G4s, whether HepG4-seq reflects the dynamics of G4s in vivo more accurately than existing detection techniques still needs to be more carefully corroborated.

      Thank you for pointing out this issue. In the in vitro hemin-G4 induced self-biotinylation assay, parallel G4s exhibit higher peroxidase activities than anti-parallel G4s. Thus, the dynamics of G4 conformation could affect the HepG4-seq signals (PMID: 32329781). In the future, people may need to combine HepG4-seq and BG4s-eq to carefully explain the endogenous G4s. We will carefully discuss this point in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Due to the significant difference between the infection timeline of mild (1 day post symptom onset) and severe (10 days post symptom onset) cohort at enrollment, an informative analysis to consider is to compare timepoint 2 from the mild cohort to timepoint 1 from the severe cohort.

      In agreement with what the reviewer noted on his comment, to be more helpful we completed the analysis comparing timepoint 2 from the mild cohort to timepoint 1 from severe cohort, which is now included as Figure 4-figure supplement 5. The new text added is on pages 13-14, lines 346-355 explaining this analysis. We also included a paragraph in the discussion on page 22, lines 595-604. We have resolved to show this comparison to enforce the main observation related to Natural Killer Cytotoxicity pathways enriched in all analyses of this work.

      (2) Alternatively, as this information is available, the authors may group the samples based on the individual's infection timeline as opposed to the recruitment timeline.

      Patients in both groups were enrolled at the peak of their symptoms. According to this criterion, we grouped the patients to generate more significant results. Since these infections occurred naturally, we have no accurate information regarding the infection timing of patients. However, if the samples were grouped in order of individual infection timeline, the analysis would be statistically weak to make conclusions about the course of COVID-19, as disease progression would not be coordinated. Our grouping approach provided us a good confidence range, despite the tiny population evaluated.

      (3) The authors selected three co-regulated network modules based on the size of module membership genes, selecting the three modules containing the largest gene membership. Small co-regulated networks can also offer important biological insights into specific molecular machinery associated with disease outcomes.

      Figure 5 was updated including two more networks (besides blue), for brown and turquoise modules (5E and 5F). This new information allowed us to understand deeply the three larger modules with the most significant results, due to the number of genes they included (blue: 704, brown: 508, and turquoise: 712). The new text describing this analysis is included in page 15 lines 388-396. The remaining 7 modules were also analyzed, and the Gene Ontology/Pathways enrichment were included in 2 new supplemental figures (Figure 5 - figure supplement 1 and 2). The new text describing this analysis is included on page 15, lines 397-401.

      (4) An alternative selection criterion that can inform biological associations between module genes and disease severity is the strength of the correlation coefficients. It seems from Figure 5B, that yellow, turquoise, and green modules have a moderate positive correlation with severe patients, while brown, blue, and gray modules show a slight positive correlation with mild outpatients. A recommendation for the authors is to consider revising Figure 5C to include the enrichment of these additional modules and include these modules in the interpretation of the results.

      The correlations between cohorts and the modules (blue, brown and turquoise) are clearly identified for severe or mild patients. However, for several smaller modules, correlations are heterogenous for different patients of the cohorts, making it hard to gain a clear conclusion related to severity groups. In this sense, the 7 modules were analyzed as is indicated in the previous response number #3, and the results offer an idea of the different transcriptional programs present at different patients in different stages of disease. However, the small number of genes in some modules brings weak results of GO and enriched pathways, making it difficult to interpretation. The text describing this figure is included in page 15 lines 397-401. Also, the network analyses for brown and turquoise modules were included in figure 5 as 5E-F and the text detailing these figures was included on page 15 lines 388-396.

      (5) In Figures 3E and 3F, the authors present enrichment analyses of differentially expressed genes from day 28. However, earlier in the results (lines 226-228), the authors reported no differentially expressed genes observed between the mild and severe participant cohort at this time point. Can the authors clarify which comparison was performed to obtain the list of differentially expressed genes used in the enrichment analyses in Figures 3E and 3F?

      The discrepancy in this case stems from separate criteria employed for comparison in each case. At the pairwise comparison, DEGs list is different from the longitudinal comparison mentioned afterwards, as for this later analysis we selected only the genes with different trajectories throughout the study (Figure 3). To clarify this point, we included a new paragraph on page 11, lines 278-285.

      Original:

      “We detected 828 genes that exhibited temporal and quantitative expression level differences during the progression of disease. We discovered additional biological processes and KEGG pathways that were differentially enriched during the COVID-19 progression in mild and severe patients (Figure 3) using the Enrichr platform (G. Chen et al., 2020)”

      Changed to:

      “To do so, we first identified genes that were differentially expressed between severity groups, and second, we chose only those that also showed changes in their trajectories across sampling times. In doing so, we found 828 genes that exhibited temporal differences in expression level during disease progression. Then using the Enrichr platform (G. Chen et al., 2020), we discovered additional biological processes and KEGG pathways that were differentially enriched during the COVID-19 progression in mild and severe patients (Figure 3).”

      (6) Additionally, the authors refer to specific enriched genes in Figure 3 (lines 298-302), but Figure 3 only displays the enriched terms. Can the authors include the results from the enrichment analysis that include gene membership for each enriched term in the supplement?

      Certainly, there is no figure or table in the initial version that includes the gene list for this analysis. We have now included a supplement table 1 and 2 that details each pathway, along with its gene list.

      (7) In line 104, can the authors clarify the parameters used to define well-matched samples?

      Based on the observations made by the reviewers, we decided to change the wording to make it more obvious about the message of this paper. The update was included on page 5, line  as follows:

      Original:

      “Here, we designed a longitudinal investigation using well-matched samples to study how changes in gene expression in distinct immune effector cells changed during the earliest time points after diagnosis and during progression of clinical disease”,

      Changed to:

      “Here, we designed a longitudinal comparison between mild and severe patients, choosing the appropriate samples according to the clinical progression and the unbiased gene expression profile”

      (8) In lines 113-116, can the authors clarify how their approach mitigates noise/potential biases and very briefly, describe what the nature of noise/biases could be?

      The main goal of this paragraph is to show that, while there are several pathways with statistical significance in our analyses, the focus was on NK cell cytotoxicity because this molecular pathway showed bridges between other relevant immune responses; thus, the pathways chosen to respond to its intricated transcriptional program instead of a biased interest. The text was edited and included on page 6, line 111-131 as follows:

      Original:

      “We used a pairwise comparison of gene expression, gene set enrichment, and weight-correlated gene network analyses to detect differential expression of genes involved with the cytotoxic signaling pathway of Natural Killer (NK) cells in mild verses severe progression of disease. We promoted a broad and integrated point of view throughout the transcriptomic analysis of functional pathways to mitigate noise and potential biases (Bastard et al., 2020; Delorey et al., 2021; Schultze & Aschenbrenner, 2021; S. Zhang et al., 2022). We found close connectivity between NK signaling pathway genes and those of cytokine-cytokine receptor signaling pathways, along with Th1/Th2 cell differentiation genes, as part of the transcriptional circuit executed preferentially among mildly ill patients. Our results detected transcriptional circuits engaging multiple regulatory checkpoints. These findings indicated that the innate NK signaling pathway (cell cytotoxic activity) is beneficial, perhaps a critically-necessary activity needed to effectively eradicate coronavirus. We interpreted that an adaptive immune response that included early cell-mediated immunity was important for reducing disease severity in mild patients. This balance between humoral- and cell-mediated immunity appeared to be less robust in patients presenting with severe COVID-19. These results detected components of the immune response that were significantly associated with the differences in symptom severity observed between mild and severely ill COVID-19 patients.”

      Changed to:

      “Briefly, to gain more insights into our findings and complement their functional context, we used a pairwise comparison of gene expression, gene set enrichment, and weight-correlated gene network analyses. By doing so, we identified pathways of genes involved with the NK cell cytotoxicity enriched in mild patients when compared to severe. Besides focusing on a particular molecular pathway, we investigated the interactions to better comprehend the underlying phenomena of a successful immune response, contributing to an integrated point of view throughout the transcriptomic analyses of functional pathways to mitigate potential biases attributed to focusing the study on a single pathway. In this regard, we revealed that the NK signaling pathway was intricately related to other transcriptional circuits, such as those governing Th1/Th2 cell differentiation and cytokine-cytokine receptor signaling pathways. These interactions highlight the importance of these pathways as bridges between the innate and adaptive immune responses throughout the disease, implying that the innate NK signaling pathway (cell cytotoxic activity) is beneficial, and possibly a critical activity required to effectively eradicate coronavirus. We also concluded that an adaptive immune response including early cell-mediated immunity was significant in lowering disease severity. The link between the primary innate NK cell activity and the transcriptional priming of adaptive Th1 and Th2 cell responses appears to be more robust in mild patients than in severe.”

      (9) In line 120, can the authors clarify which regulatory checkpoints were being referred to?

      The concept of “checkpoint” was changed to “bridges” (line 124), because offers a clearer idea about the molecular interaction displayed across the different enriched pathways described in our study. In this sense, the bridges show the connection between innate immune response by NK cell and the adaptive immune response by Th1/Th2 cells

      (10) In lines 125-126, can the authors refer to specific results to support this observation?

      Lines 111 to 129 summarize the results of the analysis that support the aforementioned phrase. However, the original sentence referred was modified for better comprehension on page 6, lines 129-131 as follows:

      Original:

      “This balance between humoral- and cell-mediated immunity appeared to be less robust in patients presenting with severe COVID-19”

      Changed to:

      “The link between the primary innate NK cell activity and the transcriptional priming of adaptive Th1 and Th2 cell responses appears to be more robust in mild patients than in severe.”

      (11) In lines 184-185, can the authors clarify what the term "mixed" specifically refers to?

      The original text was modified for better comprehension on page 8, lines 177-179 as follows:

      Original:

      “Interestingly, on day-28, when the majority of patients had recovered, samples from severely ill patients were still mixed compared to those with mild symptoms.”

      Changed to:

      “Interestingly, on day-28, when the majority of patients had recovered, samples from severely ill patients were pooled together with those mild patients who had already recovered”.

      (12) In line 286, can the authors clarify how quantitative expression level differences are distinct from temporal expression level differences?

      Despite the differences in the enrollment time between mild and severe cohorts, it was made precisely during COVID-19 symptoms peaks, as illustrated in figure 1B. Also supporting this criterion, the longitudinal analysis outlined in figure 3 was performed taking into account the changes in gene expression trajectories along all sampling times. This point has significance because the results obtained from it exposed several transcriptional programs that were dynamically executed along disease progression, even independently of the pairwise comparison approaches carried out previously.

      (13) In Figure 1C, there seemed to be two data points associated with "M1 0 days" and "M4 28 days" with distinct PC projections. Could these samples be mislabeled?

      The figure was revised and completed. The hexagon symbol for day-28 was changed for a star symbol. The “M1 0 days” and “M4 28 days” samples were labeled correctly.  See below figure 1C with changes as follows:  

      (14) In Figure 1D caption: could authors clarify if the ranking of 100 genes was based on the log2FC or adjusted p-values?

      The criteria considered was Fold Change ≥ 2 and the FDR ≤ 0.05 which is included in the methodology on page 23, lines 657-660

      (15) In Figure 4D, can the authors include the expression z score for the healthy participants?

      We could include this information, but we consider that it would not help for the understanding of this figure because in this way we put the focus on the differential trajectories between mild and severe patients. Also, DEGs from mild and severe cohorts from this analysis or any other in this work were obtained relatively to healthy donors.

      (16) Related to this, can the authors clarify if the expression z scores were computed using the mean and standard deviations of all samples within the study or relative to a specific participant cohort?

      The z-score was used considering the mild and severe patients to calculate mean and then the standard deviation of each group. A new paragraph was included in material and methods on page 24, lines 662-664.

      (17) In Figure 5B, can the authors include column annotations for participants and sampling time points?

      The figure 5B was updated and completed with the suggested information.

      (18) In Figure 1 - Figure Supplement 2, can the authors include the volcano plot from the pairwise comparison for day 28 showing no differentially expressed genes between mild and severe participants as reported in the results (lines 226-228)?

      The third volcano plot for day 28 was included in the updated figure 1 supplement 2.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is generally very well-constructed and well-written. However, the following are the major concerns mostly regarding the study design and participant selection.

      (1) The authors have used enrolment day as D0 which is not reflective of the immune response timeline. Especially when the designated 'D0' for the severe group is 10.0 + 1.8 days post symptom (DPS) onset while the 'D0' for the mild group is 1.2 + 1.3 DPS. In the context of an acute infection discussed herewith, this difference is critical.

      As tempting as it is to conduct longitudinal studies on COVID-19, the authors might do better focusing on specific acute time points (within 10 days post-symptom onset) and convalescent time points (beyond 28 days post-symptom). A better comparison would be D0 severe with D7 mild (aligning the DPS to be between 7-10 days in both groups).

      Despite the differences in the enrolment time between mild and severe cohorts, it was made precisely during COVID-19 symptoms peaks, as illustrated in figure 1B. Also supporting this criterion, the longitudinal analysis outlined in figure 3 was performed taking into account the changes in gene expression trajectories along all sampling times. This point has significance because the results obtained from it exposed several transcriptional programs that were dynamically executed along disease progression, even independently of the pairwise comparison approaches carried out previously. Likewise, we agree with the observation of the reviewer, because as we mentioned in the article, it is difficult to properly compare disease progression between naturally infected patients. So, to better support our findings, we complemented them throughout a pairwise comparison between day-7 samples from mild and day-0 samples from severely ill individuals, finding GO terms and enriched pathways related to NK cell function across the mild cohort, as seen in Figure 4-figure supplement 5. This result enforced the main findings gained from the different analyses carried out in this work, highlighting the relevance of the innate immune response of Natural Killer cells, which correlated with a mild progression of disease. The new paragraph describing this analysis was included in pages 13-14, lines 346-355. We also included a paragraph in the discussion on page 22, lines 595-604.

      (2) Though there are four participants within each group, one of the participants with severe infection (S1) only has the D0 time point which probably undermines the statistical significance of the results.

      This is an accurate observation, as the statistical weight will allow the deeper alterations to be evaluated while the more subtle ones will most likely be excluded from this study. In our analyses, we focused on variations with high statistical significance, which led to the discovery of a distinct Natural Killer response between mild and severe cohorts.

      (3) The authors should also account for any medications administered to the severe group in the ICU before enrolment in the study -immune-dampening drugs or steroids which may alter neutrophil recruitment or other immune functions.

      Only one severe patient received medication both prior to and during the COVID-19 disease. Even though several medications were administered to this patient, their effects have not been found to increase the neutrophil response.

      (4) What was the viral load status at the different time points analyzed - how does this relate to the immune and clinical findings?

      In this recruitment the viral load status was not measured.

      (5) Was any complete blood count or basic immune phenotyping conducted on these samples? Important to know the various cell frequencies in the PBMC mix sent for sequencing to account for contamination of lymphocytes with RBCs/monocytes/neutrophils as well as any lymphopenia.

      This measurement was not done for these samples. However, our protocol of PBMC purification has been tested before and showed small quantities of red blood cell contamination in the process. Furthermore, in all analysis of Gene Ontology or Enriched Pathways, there is not any related to red blood cell genes that could generate noise in the interpretation of our results.

      (6) The neutrophil/lymphocyte ratio is already skewed during SARS-CoV-2 infection - which could be the reason for higher readings in severe participants? - speculate?

      Effectively, the ratio in several cell types is changed during SARS-CoV-2 infection. However, despite this noise in the proportion of immune cells, different functions in our study are more represented in cells with less count as Natural Killer cells. The modules of co-expression analysis support the notion that despite the number of cells being in different proportions, a transcriptional program is being executed differentially in the cohorts.

      (7) CD247/ZAP70 also influences the CD16-mediated NK cell ADCC activity which the authors can add to the innate-adaptive bridging section.

      NK CD16a is more highly expressed in NK cells. The circuit involving CD247/ZAP70 and CD16 could explain the cytotoxicity of these cells and how they contribute to the establishment of a response to fight the viral infection of SARS-CoV-2. In our study, CD16a (FcgammaRIIIa) expression was similar in both mild and severe cohorts. Because our methodology only counts transcriptional changes, genes that did not change were excluded from our discussion. However, our group's research focuses on this node or bridge between innate and adaptive immune responses, with a particular emphasis on fc-antibodies functions, being a topic of interest for future research.

      (8) Some of the figures lacked clarity making it difficult to review. (Eg. Fig 4 A, Fig 4 - supplement 1 A&B, Fig 5).

      Figure 4A was redesigned, Figure 4-figure supplement 1 was presented in a full page for better resolution.

      Specific Comments:

      (1) Consider changing "covid-19" in the title of the manuscript to "COVID-19"

      Probably the journal platform changes the letters. The original title is in capital letters according to the observation. In the clinical table “COVID-19” was changed to capital letters.

      (2) Page 2: Line 24 - Consider revising this line. Not sure what the authors mean by 'early compromise'

      The paragraph was revised and rewritten.

      Original:

      “Mild COVID-19 patients presented an early compromise with NK cell function, whereas severe patients do so with neutrophil function”

      Changed to:

      ”Mild COVID-19 patients displayed an early transcriptional commitment with NK cell function, whereas severe patients do so with neutrophil function”

      (3) Page 4: Lines 57 & 58 - Verify the reference. The paper referenced was published in 2016 and is in regard to SARS-CoV, MERS-CoV, and enterovirus D68.

      Effectively, this reference was appropriate for drawing parallels with other respiratory viruses. Due to the emphasis on SARS-CoV-2, the paragraph has been strengthened with two additional references: Shen 2023, and Wauters 2022.

      (4) Page 10: Lines 229 - 234 - Consider referring to the appropriate figure (i.e., Figure Supplement 2 A or B). The figure associated with D28 DEGs (Volcano plot) is missing in the supplementary. Erroneously referred here as Figure 1C which is a PCA plot?

      The original text was changed because the figure referenced was correct but misunderstood. The final sentence is on page 9, lines 220-223.

      (5) Page 10: Line 224 - Change the sentence to " We found upregulated.." instead of " We found regulated..".

      The text was edited in accordance with this recommendation, which is currently found in line 232.

      (6) Page 13: Line 326 - Figure 4A referenced here is not clear - unable to review.

      Figure 4A was updated for a better resolution and included in the manuscript.

      (7) Page 15: Line 398 - Consider rewording "after diagnosis" since the days here are "after enrolment".

      This recommendation was considered and the text was rewritten on page 15, lines 404-406:

      Original:

      “We systematically analyzed transcriptomic features of PBMCs from COVID-19 patients with mild and severe symptoms at three sequential time-points (D0, D7, and D28) after diagnosis”

      Changed to:

      “We systematically analyzed transcriptomic features of PBMCs from COVID-19 patients with mild and severe symptoms at three sequential time-points (D0, D7, and D28) during the peak of the symptoms”

      (8) Page 17: Move text from the next page to eliminate blank space.

      Resolved

      (9) Page 32: Figure 1C - Consider changing the symbol for D28 since it looks very similar to the D0 symbol. Use the colors consistently instead of different shades for each group.

      The hexagon symbol was changed by a star symbol for D28 in figure 1C.  In this figure each color indicates the three different groups, and the transparent color was used to differentiate the symbols when are close together.

      (10) Page 36: Figure 4A - Unable to review.

      This figure was resized for better resolution.

      (11) Page 42-49: Consider relabeling and renumbering the Supplementary figures for consistency and reference the modified numbers in the appropriate location in the main text.

      The supplementary figures were relabeling for consistency and better understanding.

      (12) Pages 44 & 48: Unable to review the figures.

      The figures indicated were resized for better resolution.

      Examples of consistency review:

      (1) Use of D0,D7 / D-0, D-7 throughout the manuscript

      The selected format for the final version of the manuscript is D0, D7, and D28.

      (2) Reporting the source of reagents consistently (Name, Place, Country, Catalog number)

      The source reagents were reformatted for consistency in lines 626-628-632-642.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Editors’ recommendations for the authors

      The reviewers recommend the following: 

      (a) Digging deeper into the discussion of the density-dependent dispersal. 

      (b) Clarifying the microfluidic setup.  

      (c) Clarifying the description and interpretation of the transcriptomic evidence. 

      (d) Toning down carbon cycle connections (some reviewers felt the evidence did not fully support the claims). 

      We would like to thank the editors for their thoughtful evaluation of our manuscript and their clear suggestions. We have revised the manuscript in the light of these comments, as we outline below and address in detail in the point-by-point response to the reviewers’ comments that follows. 

      (a) We have expanded the discussion of density-dependent dispersal and revised Figure 2C to improve clarity. 

      (b) We have also added further information concerning the microfluidic setup in the results section and provide an illustration of the setup in a new figure panel, Figure 1A.

      (c) Addressing the reviewers’ comments on the transcriptomic analysis, we have added more information in the description and interpretation of the results. 

      (d) We have rephrased the text describing the role of degradation-dispersal cycles for carbon cycling to highlight it as the motivation of this study and emphasize the link to literature on foraging, without creating expectations of direct measurements of global carbon cycling.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The authors attempt to understand how cells forage for spatially heterogeneous complex polysaccharides. They aimed to quantify the foraging behavior and interrogate its genetic basis. The results show that cells aggregate near complex polysaccharides, and disperse when simpler byproducts are added. Dispersing cells tend to move towards the polysaccharide. The authors also use transcriptomics to attempt to understand which genes support each of these behaviors - with motility and transporter-related genes being highly expressed during dispersal, as expected. 

      Strengths: 

      The paper is well written and builds on previous studies by some of the authors showing similar behavior by a different species of bacteria (Caulobacter) on another polysaccharide (xylan). The conceptual model presented at the end encapsulates the findings and provides an interesting hypothesis. I also find the observation of chemotaxis towards the polysaccharide in the experimental conditions interesting. 

      Weaknesses: 

      Much of the genetic analysis, as it stands, is quite speculative and descriptive. I found myself confused about many of the genes (e.g., quorum sensing) that pop up enriched during dispersal quite in contrast to my expectations. While the authors do mention some of this in the text as worth following up on, I think the analysis as it stands adds little insight into the behaviors studied. However, I acknowledge that it might have the potential to generate hypotheses and thus aid future studies. Further, I found the connections to the carbon cycle and marine environments in the abstract weak --- the microfluidics setup by the authors is nice, but it provides limited insight into naturalistic environments where the spatial distribution and dimensionality of resources are expected to be qualitatively different. 

      We thank the reviewer for their suggestions to improve our manuscript. We agree that the original manuscript would have benefitted from more detailed interpretation of the observed changes in gene expression. We have revised the manuscript to elaborate on the interpretation of the changes in expression of quorum sensing genes (see response to reviewer 1, comment 3), motility genes (see response to reviewer 1, comment 6), alginate lyase genes (see response to reviewer 1, comment 7 and reviewer 2, comment 2), and ribosomal and transporter genes (see response to reviewer 2, comment 2).

      In general, we think that the gene expression study not only supports the phenotypic observations that we made in the microfluidic device, such as the increased swimming motility when exposed to digested alginate medium, but  also adds further insights. Our reasoning for studying the transcriptomes in well mixed-batch cultures was the inability to study gene expression dynamics to support the phenotypic observations about differential motility and chemotaxis in our microfluidics setup. The transcriptomic data clearly show that even in well-mixed environments, growth on digested alginate instead of alginate is sufficient to increase the expression of motility and chemotaxis genes. In addition, the finding that expression of alginate lyases and metabolic genes is increased during growth on digested alginate was revealed through the analysis of transcriptomes, something which would not have been possible in the microfluidic setup. We agree with the reviewer that our analyses implicate further, perhaps unexpected, mechanisms like quorum sensing in the cellular response to breakdown products, and that this represents an interesting avenue for further studies.

      Finally, we  also agree with the reviewer that it would be good to be more explicit in the text that our microfluidic system cannot fully capture the complex dynamics of natural environments. Our approach does, however, allow the characterization of cellular behaviors at spatial and temporal scales that are relevant to the interactions of bacteria, and thus provides a better understanding of colonization and dispersal of marine bacteria in a manner that is not possible through in situ experiments. We have edited our manuscript to highlight this and modified our statements regarding carbon cycling towards emphasizing the role degradation-dispersal cycles in remineralization of polysaccharides (see response to reviewer 1, comment 2).  

      Reviewer #2 (Public Review):

      Summary: 

      The paper sets out to understand the mechanisms underlying the colonization and degradation of marine particles using a natural Vibrio isolate as a model. The data are measurements of motility and gene expression using microfluidic devices and RNA sequencing. The results reveal that degradation products of alginate do stimulate motility but not chemotaxis. The evidence for these claims is strong. The story of how particle degradation occurs through colonization and dispersal has modest support in the data. A quantitative description of these dynamics awaits future studies. 

      Strengths: 

      The microfluidic and transcriptional measurements are the central strengths of the paper as they allow the delineation of phenotypes at the cellular and molecular levels in the presence of polymer and byproducts of polymer degradation. 

      Weaknesses: 

      The explanation of the microfluidics measurements is somewhat confusing but I think this could be easily remedied. The quantitative interpretation of the dispersal data could also be improved and I'm not clear if the data support the claim made. 

      We thank the reviewer for their comments and helpful suggestions. We have revised the manuscript with these suggestions in mind and believe that the manuscript is improved by a more detailed explanation of the microfluidic setup. We have added more information in the text (detailed in response to reviewer 2, comments 1 and 2) and have added a depiction of the microfluidic setup (Fig. 1A). We have also modified the presentation and discussion of the dispersal data (Fig. 2C), as described in detail below in response to reviewer 2, comment 4, and argue that they clearly show density-dependent dispersal. We believe that this modification of how the results are presented provides a more convincing case for our main conclusion, namely that the presence of degradation products controls bacterial dispersal in a density-dependent manner.  

      Reviewer #3 (Public Review):

      Summary: 

      In this manuscript, Stubbusch and coauthors examine the foraging behavior of a marine species consuming an abundant marine polysaccharide. Laboratory experiments in a microfluidic setup are complemented with transcriptomic analyses aiming at assessing the genetic bases of the observed behavior. Bacterial cells consuming the polysaccharide form cohesive aggregates, while they start dispersing away when the byproduct of the digestion of the polysaccharide starts accumulating. Dispersing cells tend to be attracted by the polysaccharide. Expression data show that motility genes are enriched during the dispersal phase, as expected. Counterintuitively, in the same phase, genes for transporters and digestion of polysaccharides are also highly expressed. 

      Strengths: 

      The manuscript is very well written and easy to follow. The topic is interesting and timely. The genetic analyses provide a new, albeit complex, angle to the study of foraging behaviors in bacteria, adding to previous studies conducted on other species. 

      Weaknesses: 

      I find this paper very descriptive and speculative. The results of the genetic analyses are quite counterintuitive; therefore, I understand the difficulty of connecting them to the observations coming from experiments in the microfluidic device. However, they could be better placed in the literature of foraging - dispersal cycles, beyond bacteria. In addition, the interpretation of the results is sometimes confusing. 

      We thank the reviewer for their suggestions to improve the manuscript. We have edited the manuscript to interpret the results of this study more clearly, in particular with regard to the fact that breakdown products of alginate cause cell dispersal (see response to reviewer 2, comment 1), gene expression changes of ribosomal proteins and transporters (see response to reviewer 2, comment 2), as well as genes relating to alginate catabolism (see response to reviewer 2, comment 3).

      To provide more context for the interpretation of our results we now also embed our findings in more detail in the previous work on foraging strategies and dispersal tradeoffs.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should clarify in more detail what they mean by density dependence in Figure 2. Usually density dependence refers to a per capita dependence, but here it seems that the per capita rate of dispersal might be roughly independent of density (Figure 2c; if you double the number of cells it doubles the number of cells leaving). Rather it seems the dispersal is such that the density of remaining cells falls below a threshold (~300 cells). 

      We thank the reviewer for raising this important point. To analyze the data more explicitly in terms of per capita dependence and so make the density dependence in the dispersal from the microfluidic chambers more clear, we have modified Figure 2C and edited the text. 

      In the modified Figure 2C, we computed the fraction of dispersed cells for each chamber (i.e the change in cell number divided by the cell number at the time of the nutrient switch). This quantity directly reveals the per-capita dependence, as mentioned by reviewer 1, and is now represented on the y-axis of Figure 2C instead of the absolute change in cell number. 

      These data demonstrate that the fraction of dispersed cells increases with increasing numbers of cells present in the chamber at the time of switching, with more highly populated chambers showing a higher fraction of dispersed cells. These findings indicate that there is a strong density dependence in the dispersal process.

      As pointed out by reviewer 1, another interesting aspect of the data is the transition at low cell number. The fraction of dispersed cells is negative in the case of the chamber with approximately 70 cells, consistent with no dispersal at this low density, and a moderate density increase as a function of continued growth.  

      In addition to the new analysis presented in Figure 2C, we have modified the paragraph that discusses this result as follows (line 208):

      “We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”

      (2) The authors should tone down their claims about the carbon cycle in the abstract. I do not believe the results as they stand could be used to understand degradation-dispersal cycles in marine environments relevant to the carbon cycle, since these behaviors have been studied in microfluidic environments which in my understanding are quite different. As such, statements such as "degradation-dispersal cycles are an integral part in the global carbon cycle, we know little about how cells alternate between degradation and motility" and "Overall, our findings reveal the cellular mechanisms underlying bacterial degradation-dispersal cycles that drive remineralization in natural environments" are overstated in the abstract. 

      We appreciate the reviewer’s comments regarding the connections of our work with the carbon cycle. We have now rephrased these statements in our manuscript to describe a potential connection between our work and the marine carbon cycle. The colonization of polysaccharides particles by bacteria and subsequent degradation has been widely acknowledged to play a significant role in controlling the carbon flow in marine ecosystems. (Fenchel, 2002; Preheim et al., 2011; Yawata et al., 2014, 2020). We still refer to carbon flow in the revised manuscript, though cautiously, as microbial remineralization of biomass, which is recognized as an important factor in the marine biological carbon pump (e.g., (Chisholm, 2000; Jiao et al., 2024). As stated in the previous version of the manuscript, the main motivation of our work was to study the growth behaviors of marine heterotrophic bacteria during polysaccharide degradation, especially to understand when bacteria depart already colonized and degraded particles and find novel patches to grow and degrade, a process that is poorly understood. Therefore, it is conceivable that degradation-dispersal cycles do play a role in the flow of carbon in marine ecosystems. However, we acknowledge that the carbon cycle is influenced by a multitude of biological and chemical processes, and the bacterial degradation-dispersal cycle might not be the sole mechanism at play. 

      We also appreciate the reviewer’s comments highlighting that the complexity of natural environments is not fully captured in our microfluidics system. However, our microfluidics setup does allow us to quantify responses and behaviors of microbial groups at high spatial and temporal resolution, especially in the context of environmental fluctuations. Microbes in nature interact at small spatial scales and have to respond to changes in the environment, and the microfluidics setup enables the quantification of these responses. Moreover, dispersal of the bacterium V. cyclitrophicus that we use in our study, has been previously observed even during growth on particulate alginate (Alcolombri et al., 2021), but the cues and regulation controlling dispersal behaviors have been unclear.  Microfluidic experiments have now allowed us to study this process in a highly quantitative manner, and align well with observations from experiments from more nature-like settings. These quantitative experiments on bacterial strains isolated from marine particles are expected to constrain quantitative models of carbon degradation in the ocean (Nguyen et al., 2022).

      We have now adjusted our statements throughout our manuscript to reflect the knowledge gaps in understanding the triggers of degradation-dispersal cycles and their links with carbon flow in marine ecosystems. The revised manuscript, especially, contains the following statements (line 47 and line 60):

      “Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”

      “Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”

      (3) The authors should clarify why they think quorum-sensing genes are increased in expression on digested alginate. The authors currently mention that QS could be used to trigger dispersal, but given the timescales of dispersal in Figure 2 (~half an hour), I find it hard to believe that these genes are expressed and have the suggested effect on those timescales. As such I would have expected the other way round - for QS genes to be expressed highly during alginate growth, so that density could be sensed and responded to. Please clarify. 

      We have now clarified this point in the revised manuscript. While the triggering of dispersal by quorum-sensing genes may indeed appear counterintuitive, and the response is rapid (we see dispersal of cells within 30-40 minutes), both observations are in line with previous studies in another model organism Vibrio cholerae. The dispersal time is similar to the dispersal time of V. cholerae cells from biofilms, as described by Singh and colleagues, (Figure 1E of Ref. Singh et al., 2017). In that case, induction of the quorum sensing dispersal regulator HapR was observed during biofilm dispersal within one hour after switch of condition (Fig. 2, middle panel of Ref. Singh et al., 2017). Even though the specific quorum sensing signaling molecules are probably different in our strain (there is no annotated homolog of the hapR gene in V. cyclitrophicus), we observed that the full set of quorum sensing genes was enriched in cells growing on digested alginate (as reported in line 314 and Fig. 4A).

      We have added this information in the manuscript (line 317): 

      “The set of quorum sensing genes was also positively enriched in cells growing on digested alginate (Fig. 4A and S4F, Table S13). This role in dispersal is in agreement with a previous study that showed induction of the quorum sensing master regulator in V. cholerae cells during dispersal from biofilms on a similar time scale as here (less than an hour)28.”

      Reviewer #2 (Recommendations For The Authors):

      (1) Around line 144 - I don't really understand how you flow alginate through the microfluidic platform. It seems if the particles are transiently going through the microfluidic chamber then the flow rate and hence residence time of the alginate particles will matter a lot by controlling the time the cells have to colonize and excrete enzymes for alginate breakdown. Or perhaps the alginate is not particulate but is instead a large but soluble polymer? I think maybe a schematic of the microfluidic device would help -- there is an implicit assumption that we are familiar with the Dal Co et al device, but I don't recall its details and maybe a graphic added to Figure 1 would help. 

      a. In reviewing the Dal Co paper I see that cells are trapped and the medium flows through channels and the plane where the cells are held. I am still a little confused about the size of the polymeric alginate -- large scale (>1um) particles or very small polymers? 

      We have now provided a detailed description of our microfluidic experimental system. At the start of the experiments, cells are in fact not trapped within the microfluidic device, but grow and can move freely within a chamber designed with dimensions (sub-micron heights) so that growth occurs only as a monolayer. Cells were exposed to nutrients, either alginate or alginate digestion products, both in soluble form (not particles). These compounds were flowed into the device through a main channel, but entered the flowfree growth chambers by diffusion. To make these aspects of our experiments clearer, we have added further information on this in the Materials & Methods section (line 556), added this information in the abstract (line 51), and in the results (line123).

      To make our microfluidic setup clearer, we have followed this advice and added a schematic as Figure 1A and have added more information on the setup to the main text (line 153):

      “In brief, the microfluidic chips are made of an inert polymer (polydimethylsiloxane) bound to a glass coverslip. The PDMS layer contains flow channels through which the culture medium is pumped continuously. Each channel is connected to several growth chambers that are laterally positioned. The dimensions of these growth chambers (height: 0.85 µm, length: 60 µm, width: 90-120 µm) allow cells to freely move and grow as monolayers. The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion15,16,4,17,8. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4. This setup combined with time-lapse microscopy allowed us to follow the development of cell communities over time.”

      (2) What makes this confusing is the difference between Figure 1C and Figure S2A -- the authors state that the difference in Figure 1C is due to dispersal, but is there flow through the microfluidic device? So what role does that flow through the device have in dispersal? Is the adhesion of the cell groups driven at all by a physical interaction with high molecular weight polymers in the microfluidic devices or is this purely a biological effect? Could this also be explained by different real concentrations of nutrients in the two cases? 

      We realize from this comment that the role of flow of the medium in the microfluidic setup was not clearly addressed in our manuscript. In fact, cells were not exposed to flow, and nutrients were provided to the growth chambers by diffusion. We have added a clearer explanation of this point on line 158:

      “The culture medium, containing either alginate or digested alginate in their soluble form, is constantly pumped through the flow channel and enters the growth chambers primarily through diffusion15,16,4,17,8. Therefore, the number of cells and their positioning within microfluidic chambers is determined by the cellular growth rate as well as by cell movement4.“

      One purely physical effect that we anticipate is that a high viscosity of the medium could immobilize cells. To address this point, we measured the viscosity of both alginate and digested alginate and conclude that the increase in viscosity is not strong enough to immobilize cells. We added a statement in the text (line 170)

      “To test the role of increased viscosity of polymeric alginate in causing the increased aggregation of cells, we measured the viscosity of 0.1% (w/v) alginate or digested alginate dissolved in TR media. For alginate, the viscosity was 1.03±0.01 mPa·s (mean and standard deviation of three technical replicates) whereas the viscosity of digested alginate in TR media was found to be 0.74±0.01 mPa·s. Both these values are relatively close to the viscosity of water at this temperature (0.89 mPa·s18) and, while they may affect swimming behavior19, they are insufficient to physically restrain cell movement20.”

      as well as a section in the Materials and Methods (line 594):

      “Viscosity of the alginate and digested alginate solution

      We measured the viscosity of alginate solutions using shear rheology measurements. We use a 40 mm cone-plate geometry (4° cone) in a Netzsch Kinexus Pro+ rheometer. 1200 uL of sample was placed on the bottom plate, the gap was set at 150 um and the sample trimmed. We used a solvent trap to avoid sample evaporation during measurement. The temperature was set to 25°C using a Peltier element. We measure the dynamic viscosity over a range of shear rates  = 0.1 – 100 s-1. We report the viscosity of each solution as the average viscosity measured over the shear rates 10 – 100 s-1, where the shear-dependence of the viscosity was low.

      We measured the viscosity of 0.1% (w/V) alginate dissolved in TR media, which was 1.03 +/- 0.01 mPa·s (reporting the mean and standard deviation of three technical replicates.). The viscosity of 0.1% digested alginate in TR media was found to be 0.74+/-0.01 mPa·s. This means that the viscosity of alginate in our microfluidic experiments is 36% higher than of digested alginate, but the viscosities are close to those expected of water (0.89 mPa·s at 25 degree Celsius according to Berstad and colleagues18).”

      While our microfluidic setup allows us to track the position and movement of cells in a spatially structured setting, these observations do not allow us to distinguish directly whether the differences in dispersal are a result of purely physical effects of polymers on cells or are a result of them triggering a biological response in cells that causes them to become sessile. It is known that bacterial appendages like pili interact with polysaccharide residues (Li et al., 2003). Therefore, it is quite plausible that cross-linking by polysaccharides can contribute growth behaviors on alginate. However, our analysis of gene expression demonstrates that flagellum-driven motility is decreased in the presence of alginate compared to digested alginate, alongside other major changes in gene expression. In addition, our measures of dispersal show that dispersal of cells when exposed to digested alginate is density dependent. Both observations suggest that the patterns in dispersal are governed by decision-making processes by cells resulting in changes in cell motility, rather than being a product of purely physical interactions with the polymer. 

      The finding that viscosities of both alginate and digested alginate are similar to that of water, suggests that diffusion of nutrients in the growth chambers should be similar. Therefore, we think that the differences in real concentrations of nutrients is likely not contributing to the observed differences in behavior. 

      (3) Why is Figure S1 arbitrary units? Does this have to do with the calibration of LC-MS? It would be better, it seems, to know the concentrations in real units of the monomer at least. 

      We agree with the reviewer that it would have been better to have absolute concentrations for these compounds. However, to calibrate the mass spectrometer signals (ion counts) to absolute concentrations for the different alginate compounds, we would need an analytical standard of known concentration. We are not aware of such a standard and thus report only relative concentrations. We agree that the y-axis label of Figure S1 should not contain ‘arbitrary’ units, as it shows a ratio (of measurements in the same arbitrary units). We have edited the labels of Figure S1 accordingly and the figure legend in line 26 of the Supplemental Material (“Relative concentrations…”).

      (4) Line 188 - density-dependent dispersal. The claim here is that "cells in chambers with many cells were more likely to disperse than cells in chambers with less cells." (my emphasis). Looking at the data in Figure 2C it appears that about 40% of the cells disperse irrespective of the density, before the switch to digested alginate. So it would seem that there is not a higher likelihood of dispersal at higher cell densities. For the very highest cell density, it does appear that this fraction is larger, but I'd be concerned about making this claim from what I understand to be a single experiment. To support the claim made should the authors plot Change in Cell number/Starting Cell number on the y-axis of Fig. 2C to show that the fraction is increasing? It would seem some additional data at higher starting cell densities would help support this claim more strongly. 

      We thank the reviewer for this comment, which is in line with a remark made by reviewer 1 in their comment 1. In response to these two comments (and as described above), we have edited Figure 2C and now have plotted the change in cell number relative to starting cell number at the y axis to directly show the density dependence. We observe a positive (approximately linear) relationship between the fraction of dispersed cells with the number of cells present in the chamber at the time of switching. This indicates that there is a density dependence in the dispersal process, with highly populated chambers showing a higher fraction of dispersed cells. 

      In addition to the change in Figure 2C, we have modified the paragraph around line 208: “We indeed found that the nutrient switch caused a few or no cells to disperse from small cell groups (Fig. 2B), whereas a large fraction of cells from large cell groups dispersed (Fig. 2C). In fact, the e fraction of cells that dispersed upon imposition of the nutrient switch showed a strong positive relationship with the number of cells present, meaning that cells in chambers with many cells were more likely to disperse than cells in chambers with fewer cells (Fig. 2C).”

      The highest cell number at the start of the switch that we include is about 800 cells. The maximum number of cells that can fit into a chamber are ca. 1000 cells. Thus, 800 resident cells are close to the maximal density.

      (5) A comment -- I find the result of significant chemotaxis towards alginate but not the monomers of alginate to be quite surprising. The ecological relevance of this (line 219) seems like an important result that is worth expanding on a bit at least in the discussion. For now, my question is whether the authors know of any mechanism by which chemotaxis receptors could respond to alginate but not the monomer. How can a receptor distinguish between the two? 

      We agree that this result is surprising, given that oligomers can be more easily transported into the periplasm where sensing takes place, and they also provide an easier accessible nutrient source. Indeed, in case of the insoluble polymer chitin it has been shown that chemotaxis towards chitin is mediated by chitin oligomers (Bassler et al., 1991), which was suggested as a general motif to locate polysaccharide nutrient sources (Keegstra et al., 2022). However, a recent study has changed this perspective by showing widespread chemotaxis of marine bacteria towards the glucose-based marine polysaccharide laminarin, but not towards laminarin oligomers or glucose (Clerc et al., 2023). Together with our results on chemotaxis towards alginate (but not significantly toward alginate oligomers) this suggests that chemotaxis towards soluble polysaccharides can be mediated by direct sensing of the polysaccharide molecules.

      As recommended, we expanded the discussion of the ecological relevance and also added more information on possible mechanisms of selective sensing of alginate and its breakdown products (around line 479).:

      “Direct chemotaxis towards polysaccharides may facilitate the search for new polysaccharide sources after dispersal. We found that the presence of degradation products not only induces cell dispersal but also increases the expression of chemotaxis genes. Interestingly, we found that V. cyclitrophicus ZF270 cells show chemotaxis towards polymeric alginate but not digested alginate. This contrasts with previous findings for bacterial strains degrading the insoluble marine polysaccharide chitin, where chemotaxis was strongest towards chitin oligomers53, suggesting that oligomers may act as an environmental cue for polysaccharide nutrient sources55. However, recent work has shown that certain marine bacteria are attracted to the marine polysaccharide laminarin, and not laminarin oligomers56. Together with our results, this indicates that chemotaxis towards soluble polysaccharides may be mediated by the polysaccharide molecules themselves. The mechanism of this behavior is yet to be identified, but could be mediated by polysaccharide-binding proteins as have been found in Sphingomonas sp. A1 facilitating chemotaxis towards pectin57. Direct polysaccharide sensing adds complexity to chemosensing as polysaccharides cannot freely diffuse into the periplasm, which can lead to a trade-off between chemosensing and uptake58. Furthermore, most polysaccharides are not immediately metabolically accessible as they require degradation. But direct polysaccharide sensing can also provide certain benefits compared to using oligomers as sensory cues. First, it could enable bacterial strains to preferably navigate to polysaccharide nutrients sources that are relatively uncolonized and hence show little degradation activity. Second, strong chemotaxis towards degradation products could hinder a timely dispersal process as the dispersal then requires cells to travel against a strong attractant gradient formed by the degradation products. Overall, this strategy allows cells to alternate between degradation and dispersal to acquire carbon and energy in a heterogeneous world with nutrient hotspots44,59–61.”

      (6) Comment on lines 287-8 -- that the "positive enrichment of the gene set containing bacterial motility proteins matched the increase in motile cells that we observe in Fig 3E." I'm confused about what is meant by the word "matched" here. Is the implication that there is some quantitative correspondence between increased motility in Figure 3 and the change in expression in Figure 4? Or is the statement a qualitative one -- that motility genes are upregulated in the presence of digested alginate? Table S12 didn't help me answer this question. 

      We thank the reviewer for their helpful comment. Our original statement was a qualitative one - observing that gene expression enrichment in genes associated with bacterial motility aligned with our expectations based on the previous observation of an increase in motile cells. We have now changed the wording to highlight the qualitative nature of this statement (line 315):

      “The positive enrichment of the gene set containing bacterial motility proteins aligned with our expectations based on the increase in motile cells that we observed in Figure 3E (Fig. 4A, Table S12).”

      (7) Line 326 - what is the explanation for the production of public enzymes in the presence of digest? How does this square with the previous narrative about cells growing on alginate digest expressing motility genes and chemotaxing towards alginate? It seems like the story is a bit tenuous here in the sense that digested alginates stimulate both motility - which is hypothesized to drive the discovery of new alginate particles - and lyase enzymes which are used to degrade alginate. So do the high motility cells that are chemotaxing towards alginate also express lyases en route? I'm of the opinion that constructing narratives like these in the absence of a more quantitative understanding of the colonization and degradation dynamics of alginate particles presents a major challenge and may be asking more of the data than the data can provide. 

      a. I noted later that this is addressed later around lines 393 in the Discussion section.

      Indeed, the notion that the presence of breakdown products triggers motility and also increases the expression of alginate lyases and other metabolic genes for alginate catabolism seems counterintuitive. We have now expanded our discussion of these results to contextualize these findings (around line 443):

      “One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources50,51. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell50. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression50,51. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses50,52. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.“

      (8) I like Figure 6, and I think this hypothesis is a good result from this paper, but I think it would be important to emphasize this as a proposal that needs further quantitative analysis to be supported. 

      We have now edited the manuscript to make this point more clear. While both degradation and dispersal are well-appreciated parts of microbial ecology, the transitions and underlying mechanisms are unclear. We have edited the discussion to improve the clarity (line 419): 

      “This cycle of biomass degradation and dispersal has long been discussed in the context of foraging e.g., 44,45,13,46,47, but the cellular mechanisms that drive the cell dispersal remain unclear.”

      Also, we have updated Figure 6 to indicate more clearly which new findings this work proposes (now bold font) and which previous findings that were made in different bacterial taxa and carbon sources that aligns with our  work (now light font). We edited the figure legend accordingly (line 503):

      “By integrating our results with previous studies on cooperative growth on the same system, as well as results on dispersal cycles in other systems, we highlight where the specific results of this work add to this framework (bold font).”

      Minor comments 

      (1) Is there any growth on the enzyme used for alginate digestion? E.g. is the enzyme used to digest the alginate at sufficiently high concentrations that cells could utilize it for a carbon/nitrogen source? 

      We thank the reviewer for raising this point. We added the following paragraph as Supplemental Text to address it (line 179):

      “Protein amount of the alginate lyases added to create digested alginate

      Based on the following calculation, we conclude that the amount of protein added to the growth medium by the addition of alginate lyases is so small that we consider it negligible. In our experiment we used 1 unit/ml of alginate lyases in a 4.5 ml solution to digest the alginate. As the commercially purchased alginate lyases are 10,000 units/g, our 4.5 ml solution contains 0.45 mg of alginate lyase protein. The digested alginate solution diluted 45x when added to culture medium. This means that we added 0.18 µg alginate lyase protein to 1 ml of culture medium.

      As a comparison, for 1ml of alginate medium, 1000µg of alginate is added or for 1 ml of Lysogeny broth (LB) culture medium, 3,500 µg of LB are added.  Thus, the amount of alginate lyase protein that we added is ca. 5000 - 20,000 times smaller than the amount of alginate or LB that one would add to support cell growth. Therefore, we expect the growth that the digestion of the added alginate lyases would allow to be negligible.”

      (2) The lines in Figure 2B are very hard to see. 

      We have addressed this comment by using thicker lines in Figure 2B.

      (3) The black background and images in Figure 3A and B are hard to see as well. 

      We have now replaced Figure 3A and B, now using a white background.

      (4) Typo at the beginning of line 251? 

      Unfortunately we failed to find the typo referred to. We are happy to address it if it still exists in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) I think there is not enough experimental evidence to conclude that the underlying cause of increased motility is the accumulation of digested alginate products. To conclusively show that this is the cause and not just some signal linked to cell density, perhaps the experiment should be repeated with a different carbon source. 

      We thank the reviewer for their comment, which made us realize that we did not make the nature of the dispersal cue clear. The gene expression data was obtained from batch cultures and measured at the same approximate bacterial densities in batch, which indeed shows that the digested alginate is a sufficient signal for an increase in motility gene expression. This agrees very well with our observation that cells growing on digested alginate in microfluidic chambers have an increased fraction of motile cells in comparison with cells exposed to alginate (Fig 3E). However, we did not mean to suggest that the observed dispersal by bacterial motility is not influenced by cell density, in fact, we see that dispersal (and hence the increase in cell motility) in microfluidic chambers that are switched from polymeric to digested alginate depends on the bacterial density in the chamber, with higher bacterial densities showing increased dispersal. This shows that the presence of alginate oligomers does trigger dispersal through motility, but this signal affects bacterial groups in a cell density dependent manner.

      Similar observations have been made in Caulobacter crescentus, which was found to form cell groups on the polymer xylan while cells disperse when the corresponding monomer xylose becomes available (D’Souza et al., 2021). We reference the additional work in lines 179 and 230. Taken together, these observations indicate a more general phenomenon in dispersal from polysaccharide substrates.

      (2) About the expression data: 

      • Ribosomal proteins and ABC transporters are enriched in cells grown on digested alginate and the authors discuss that this explains the difference in max growth rate between alginate and digested alginate. However, in Figure S2E the authors report no statistical difference between growth rates. 

      We have now edited the manuscript to clarify this point. We found that cells grown on degradation products reached their maximal growth rate around 7.5 hours earlier (Fig. S2D) and showed increased expression of ribosomal biosynthesis and ABC transporters in late-exponential phase (Fig. 4A). We consider this shorter lag time as a sign of a different growth state and therefore a possible reason for the difference in ribosomal protein expression.

      As the reviewer correctly points out, the maximum growth rates that were computed from the two growth curves were not significantly different (Fig. S2E). However, for our gene expression analysis, we harvested the transcriptome of cells that reached OD 0.39-0.41 (mid- to late-exponential phase). At this time point, the cell cultures may have differed in their momentary growth rate.

      We edited the manuscript to make this clearer (line 287):

      “Both observations likely relate to the different growth dynamics of V. cyclitrophicus ZF270 on digested alginate compared to alginate (Fig. S2A), where cells in digested alginate medium reached their maximal growth rate 7.5 hours earlier and thus showed a shorter lag time (Fig. S2D). As a consequence, the growth rate at the time of RNA extraction (mid-to-late exponential phase) may have differed, even though the maximum growth rate of cells grown in alginate medium and digested alginate medium were not found to be significantly different (Fig. S2E).”

      • The increased expression of transporters for lyases in cells grown on digested alginate (lines 273-274 and 325-328) is very confusing and the explanation provided in lines 412420 is not very convincing. My two cents on this: Expression of more enzymes and induction of motility might be a strategy to be prepared for more likely future environments (after dispersal, alginate is the most likely carbon source they will find). This would be in line with observed increased chemotaxis towards the polymer rather than the monomer (Similar to C. elegans). 

      This comment is in line with reviewer 2, comment 7. In response to these two comments (and as described above), we expanded our discussion of these results to contextualize these findings (around line 443):

      “One reason for this observation may be that cells primarily rely on intracellular monosaccharide levels to trigger the upregulation of genes associated with polysaccharide degradation and catabolism, as has previously been observed for E. coli across various carbon sources50,51. In fact, the majority of carbon sources are sensed by prokaryotes through one‑component sensors inside the cell50. In the one‑component internal sensing scheme, the enzymes and transporters for the use of various carbon sources are expressed at basal levels, which leads to an increase in pathway intermediates upon nutrient availability. The pathway intermediates are sensed by an internal sensor, usually a transcription factor, and lead to the upregulation of transporter and enzyme expression50,51. This results in a positive feedback loop, which enables small changes in substrate abundance to trigger large transcriptional responses50,52. Thus, the presence of alginate breakdown products may likely result in increased expression of all components of the alginate degradation pathway, including the expression of degrading enzymes. As the gene expression analysis was performed on well-mixed cultures in culture medium containing alginate breakdown products, we therefore expect a strong stimulation of alginate catabolism. In a natural scenario, where cells disperse from a polysaccharide hotspot before its exhaustion, the expression of alginate catabolism genes may likely decrease again once the local concentration of breakdown products decreases. However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.”

      Additionally, we agree with the intriguing comment that continued expression of alginate lyases may also prepare cells for likely future environments. Further studies that aim to answer whether marine bacteria are primed by their growth on one carbon source towards faster re-initiation of degradation on a new particle will be an interesting research question. We now address this point in our manuscript (line 458):

      “However, continued production of alginate lyases could also provide an advantage when encountering a new alginate source and continued production of alginate lyases may thus help cells to prepare for likely future environments. Further investigations of bacterial enzyme secretion in changing nutrient environments and at relevant spatial scales are required to improve our understanding of the regulation of enzyme secretion along nutrient gradients.“

      (3) The yield reached by Vibrio on alginate is significantly higher than the yield in digested alginate, not similar, as stated in lines 133-134. Only cell counts are similar. Perhaps the author can correct this statement and speculate on the reason leading to this discrepancy: perhaps cells tend to aggregate in alginate despite the fact that these are well-mixed cultures. 

      We have edited the description of the OD measurements accordingly and agree with the reviewer that aggregation is indeed a possible reason for the discrepancy (line 141):

      “We also observed that the optical density at stationary phase was higher when cells were grown on alginate (Fig. S2B and C). However, colony counts did not show a significant difference in cell numbers (Fig. S3), suggesting that the increased optical density may stem from aggregation of cells in the alginate medium, as observed for other Vibrio species7.”

      (4) I suggest toning down the importance of the results presented in this study for understanding global carbon cycling. There is a link but at present it is too much emphasized. 

      We have edited our statements regarding the carbon cycle. In the revised manuscript we stress the lack of direct quantifications of carbon cycling. . We still refer to carbon flow in the revised manuscript, as we would argue that microbial remineralization of biomass is recognized as an important factor in the marine biological carbon pump (e.g., Chisholm, 2000) and research on marine bacterial foraging investigates how bacterial cells manage to find and utilize this biomass.

      Our revised manuscript contains the following modified statements (line 47 and line 60): “Even though many studies indicate that these degradation-dispersal cycles contribute to the carbon flow in marine systems, we know little about how cells alternate between polysaccharide degradation and motility, and which environmental factors trigger this behavioral switch.”

      “Overall, our findings reveal cellular mechanisms that might also underlie bacterial degradation-dispersal cycles, which influence the remineralization of biomass in marine environments.”

      References

      Alcolombri, U., Peaudecerf, F. J., Fernandez, V. I., Behrendt, L., Lee, K. S., & Stocker, R. (2021). Sinking enhances the degradation of organic particles by marine bacteria. Nature Geoscience, 14(10), 775–780. https://doi.org/10.1038/s41561-021-00817-x

      Bassler, B. L., Gibbons, P. J., Yu, C., & Roseman, S. (1991). Chitin utilization by marine bacteria. Chemotaxis to chitin oligosaccharides by Vibrio furnissii. Journal of Biological Chemistry, 266(36), 24268–24275. https://doi.org/10.1016/S0021-9258(18)54224-1

      Chisholm, S. W. (2000). Stirring times in the Southern Ocean. Nature, 407(6805), 685–686. https://doi.org/10.1038/35037696

      Chubukov, V., Gerosa, L., Kochanowski, K., & Sauer, U. (2014). Coordination of microbial metabolism. Nature Reviews. Microbiology, 12(5), 327–340. https://doi.org/10.1038/nrmicro3238

      Clerc, E. E., Raina, J.-B., Keegstra, J. M., Landry, Z., Pontrelli, S., Alcolombri, U., Lambert, B. S., Anelli, V., Vincent, F., Masdeu-Navarro, M., Sichert, A., De Schaetzen, F., Sauer, U., Simó, R., Hehemann, J.-H., Vardi, A., Seymour, J. R., & Stocker, R. (2023). Strong chemotaxis by marine bacteria towards polysaccharides is enhanced by the abundant organosulfur compound DMSP. Nature Communications, 14(1), 8080. https://doi.org/10.1038/s41467-023-43143z

      Dal Co, A., van Vliet, S., Kiviet, D. J., Schlegel, S., & Ackermann, M. (2020). Shortrange interactions govern the dynamics and functions of microbial communities. Nature Ecology and Evolution, 4(3), 366–375. https://doi.org/10.1038/s41559-019-1080-2

      D’Souza, G., Ebrahimi, A., Stubbusch, A., Daniels, M., Keegstra, J., Stocker, R., Cordero, O., & Ackermann, M. (2023). Cell aggregation is associated with enzyme secretion strategies in marine polysaccharide-degrading bacteria. The ISME Journal. https://doi.org/10.1038/s41396-023-01385-1

      D’Souza, G. G., Povolo, V. R., Keegstra, J. M., Stocker, R., & Ackermann, M. (2021). Nutrient complexity triggers transitions between solitary and colonial growth in bacterial populations. The ISME Journal, 15(9), 2614–2626. https://doi.org/10.1038/s41396-021-00953-7

      D’Souza, G., Schwartzman, J., Keegstra, J., Schreier, J. E., Daniels, M., Cordero, O. X., Stocker, R., & Ackermann, M. (2023). Interspecies interactions determine growth dynamics of biopolymer-degrading populations in microbial communities. Proceedings of the National Academy of Sciences of the United States of America, 120(44), e2305198120. https://doi.org/10.1073/pnas.2305198120

      Fenchel, T. (2002). Microbial Behavior in a Heterogeneous World. Science, 296(5570), 1068–1071. https://doi.org/10.1126/science.1070118

      Jiao, N., Luo, T., Chen, Q., Zhao, Z., Xiao, X., Liu, J., Jian, Z., Xie, S., Thomas, H., Herndl, G. J., Benner, R., Gonsior, M., Chen, F., Cai, W.-J., & Robinson, C. (2024). The microbial carbon pump and climate change. Nature Reviews Microbiology. https://doi.org/10.1038/s41579-024-01018-0

      Keegstra, J. M., Carrara, F., & Stocker, R. (2022). The ecological roles of bacterial chemotaxis. Nature Reviews Microbiology, 20(8), 491–504. https://doi.org/10.1038/s41579-022-00709-w

      Konishi, H., Hio, M., Kobayashi, M., Takase, R., & Hashimoto, W. (2020). Bacterial chemotaxis towards polysaccharide pectin by pectin-binding protein. Scientific Reports, 10(1), 3977. https://doi.org/10.1038/s41598-020-60274-1

      Li, Y., Sun, H., Ma, X., Lu, A., Lux, R., Zusman, D., & Shi, W. (2003). Extracellular polysaccharides mediate pilus retraction during social motility of Myxococcus xanthus. Proceedings of the National Academy of Sciences, 100(9), 5443–5448. https://doi.org/10.1073/pnas.0836639100

      Martínez-Antonio, A., Janga, S. C., Salgado, H., & Collado-Vides, J. (2006). Internal sensing machinery directs the activity of the regulatory network in Escherichia coli. Trends in Microbiology, 14(1), 22–27. https://doi.org/10.1016/j.tim.2005.11.002

      McDougald, D., Rice, S. A., Barraud, N., Steinberg, P. D., & Kjelleberg, S. (2012). Should we stay or should we go: Mechanisms and ecological consequences for biofilm dispersal. Nature Reviews Microbiology, 10(1), 39–50. https://doi.org/10.1038/nrmicro2695

      Nguyen, T. T. H., Zakem, E. J., Ebrahimi, A., Schwartzman, J., Caglar, T., Amarnath, K., Alcolombri, U., Peaudecerf, F. J., Hwa, T., Stocker, R., Cordero, O. X., & Levine, N. M. (2022). Microbes contribute to setting the ocean carbon flux by altering the fate of sinking particulates. Nature Communications, 13(1), 1657. https://doi.org/10.1038/s41467-022-29297-2

      Norris, N., Alcolombri, U., Keegstra, J. M., Yawata, Y., Menolascina, F., Frazzoli, E., Levine, N. M., Fernandez, V. I., & Stocker, R. (2022). Bacterial chemotaxis to saccharides is governed by a trade-off between sensing and uptake. Biophysical Journal, 121(11), 2046–2059. https://doi.org/10.1016/j.bpj.2022.05.003

      Povolo, V. R., D’Souza, G. G., Kaczmarczyk, A., Stubbusch, A. K., Jenal, U., & Ackermann, M. (2022). Extracellular appendages govern spatial dynamics and growth of Caulobacter crescentus on a prevalent biopolymer. bioRxiv, 2022.06.13.495907. https://doi.org/10.1101/2022.06.13.495907

      Preheim, S. P., Boucher, Y., Wildschutte, H., David, L. A., Veneziano, D., Alm, E. J., & Polz, M. F. (2011). Metapopulation structure of Vibrionaceae among coastal marine invertebrates. Environmental Microbiology, 13(1), 265–275. https://doi.org/10.1111/j.1462-2920.2010.02328.x

      Schwartzman, J. A., Ebrahimi, A., Chadwick, G., Sato, Y., Orphan, V., & Cordero, O. X. (2021). Bacterial growth in multicellular aggregates leads to the emergence of complex lifecycles. bioRxiv, 2021.11.01.466752.

      https://doi.org/10.1101/2021.11.01.466752

      Singh, P. K., Bartalomej, S., Hartmann, R., Jeckel, H., Vidakovic, L., Nadell, C. D., & Drescher, K. (2017). Vibrio cholerae Combines Individual and Collective Sensing to Trigger Biofilm Dispersal. Current Biology, 27(21), 3359-3366.e7. https://doi.org/10.1016/j.cub.2017.09.041

      Ulrich, L. E., Koonin, E. V., & Zhulin, I. B. (2005). One-component systems dominate signal transduction in prokaryotes. Trends in Microbiology, 13(2), 52–56. https://doi.org/10.1016/j.tim.2004.12.006

      Wall, M. E., Hlavacek, W. S., & Savageau, M. A. (2004). Design of gene circuits: Lessons from bacteria. Nature Reviews Genetics, 5(1), 34–42. https://doi.org/10.1038/nrg1244

      Yawata, Y., Carrara, F., Menolascina, F., & Stocker, R. (2020). Constrained optimal foraging by marine bacterioplankton on particulate organic matter. Proceedings of the National Academy of Sciences, 117(41), 25571–25579. https://doi.org/10.1073/pnas.2012443117

      Yawata, Y., Cordero, O. X., Menolascina, F., Hehemann, J.-H., Polz, M. F., & Stocker, R. (2014). Competition–dispersal tradeoff ecologically differentiates recently speciated marine bacterioplankton populations. Proceedings of the National Academy of Sciences, 111(15), 5622–5627. https://doi.org/10.1073/pnas.1318943111

      Zöttl, A., & Yeomans, J. M. (2019). Enhanced bacterial swimming speeds in macromolecular polymer solutions. Nature Physics, 15(6), 554–558. https://doi.org/10.1038/s41567-019-0454-3

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1: The authors may consider moving the supplemental figures into the main body of the paper since they finally would end up with a total of eight figures.

      As we added two more supplementary figures, we left them separated from the main part of the manuscript in the supplement. All of them describe important experimental details but we believe that it is easier to follow if there is a focus on the key results.

      Reviewer #1: In general, the methods and techniques used here are beside some required but important additions described in sufficient detail.

      Reviewer #2: Given the identified importance of glow-discharge treatment of precoated tape to the flat deposition of sections during ATUM, a corresponding schematic or appropriate reference(s) providing more information about the custom-built tape plasma device would likely be a prerequisite for effective reproduction of this technique in other laboratories.

      Thank you for the valuable comments on the missing experimental details, which could affect the ease of establishing ATUM-Tomo in other labs. We will clearly highlight the ATUM-Tomo-specific vs. some general EM processing steps of the workflow in the proposed way. A detailed description of the custom-built tape plasma device will be added to the methods section. In addition, we will reference more explicitly our published protocols, which describe the standard electron microscopy embedding steps in great detail (Kislinger et al., STAR protocols, 2020; Kislinger et al., Meth Cell Biol, 2023).

      Reviewer #1: Concerning the results section: In my opinion, the results section is a bit unbalanced. There is a mismatch between the detailed description of the methodology (experimental approach) and the scientific findings of the paper. The reviewer can see the enormous methodological impact of the paper, which on the other hand is the major drawback of the paper. To my opinion, the authors should also give a more detailed description of their scientific results.

      Concerning the discussion: It would have been nice to give a perspective to which the described methodology can be used not only to describe diverse biological aspects that can be addressed and answered by this experimental approach. For example, how could this method be used to address various questions about the normal and pathologically altered brain?

      In my opinion, the paper has one major drawback which is that it is more methodologically based although the authors included a scientific application of the method. The question here is to balance the methodology vs. the scientific achievement of this paper, a decision hard to take. In other words, one could recommend this paper to more methodologically based journals, for example, Nature Methods.

      Balancing the technological and biological parts is indeed a difficult issue. We agree that this manuscript mainly describes a technical advancement and demonstrates its power to answer previously unsolved scientific questions. We exemplify this in our model system, neuropathology of the blood-brain barrier. The biological impact of ATUM-SEM has been described in detail in Khalin et al., Small, 2022, and is referenced accordingly. Here we describe how ATUM-Tomo can be applied to reveal biological insights exceeding the capabilities of ATUM-SEM and other volume electron microscopy techniques. However, the description of the methodological development outweighs by far the one of the biological details. We consider eLife‘s Tools and Resources (which, in our view, is in scope similar to Nat Methods) an ideal format for this technically focused manuscript while targeting eLife’s readership with diverse biological fields of interest for potential applications of the method. We suggested the application in connectomics (for chemical synapses), the study of endocytosis and the detection of virus particles in the discussion. Hopefully, this accommodates the Reviewer’s concern that having only a single application might seem arbitrary or even suggest a very narrow utility of the technique.

      “While we demonstrate a neuropathology-related application, further biological targets that require high-resolution isotropic voxels and the spatial orientation within a larger ultrastructural context can potentially be studied by ATUM-Tomo. This includes the detection of gap junctions for connectomics or for the study of long-range projections (Holler et al., 2021) and the subcellular location of virus particles (Wu et al., 2022, Roingeard, 2008, Pelchen-Matthews and Marsh, 2007). Thus, ATUM-Tomo opens up new avenues for multimodal volume EM imaging of diverse biological research areas.”

      Reviewer #2: Is the separation of sections from permanent marker-treated tape sensitive to the time interval between deposition/SEM imaging and acetone treatment?

      Thank you for pointing out this important methodological aspect. We have not systematically investigated whether there is a critical time window between microtomy, SEM, and detachment. From the samples generated for this study, we assessed the importance of timing in retrospect:

      “The sections could be recovered even four months after collection and nine months after SEM imaging.”

      Reviewer #2: To what extent is slice detachment from permanent marker-treated tape resin-dependent [i.e. has ATUM-Tomo been tested on resin compositions beyond LX112 (LADD)]?

      We appreciate this comment addressing the broader technical applicability of ATUM-Tomo. We tested the general workflow with tissue embedded in other commonly used resin types (epon, durcupan).

      Reviewer #2: Minor corrections to the text and figures.

      Line 83: ((Khalin et al., 2022) should read (Khalin et al., 2022)

      Line 86 : 30nm should read 30 nm

      Line 139: "...morphological normal tight junctions..." should read "...morphologically normal tight junctions..."

      Line 283: "....despite glutaraldehyde fixation, a prerequisite for optimal ultrastructural preservation...".

      Line 295: "In contrast, our CLEM approach provides high ultrastructural quality by optimal chemical fixation".

      The concepts of optimal preservation and optimal fixation are arguably context- and application-dependent. These statements should be toned down or their context explicitly stated.

      Thank you for the detailed corrections. We have applied them accordingly.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Faniyan and colleagues build on their recent finding that renal Glut2 knockout mice display normal fasting blood glucose levels despite massive glucosuria. Renal Glut2 knockout mice were found to exhibit increased endogenous glucose production along with decreased hepatic metabolites associated with glucose metabolism. Crh mRNA levels were higher in the hypothalamus while circulating ACTH and corticosterone was elevated in this model. While these mice were able to maintain normal fasting glucose levels, ablating afferent renal signals to the brain caused low fasting blood glucose levels. In addition, the higher CRH and higher corticosterone levels of the knockout mice were lost following this denervation. Finally, acute phase proteins were altered, plasma Gpx3 was lower, and major urinary protein MUP18 and its gene expression were higher in renal Glut2 knockout mice. Overall, the main conclusion that afferent signaling from the kidney is required for renal glut2 dependent increases in endogenous glucose production is well supported by these findings.

      Strengths:

      An important strength of the paper is the novelty of the identification of kidney to brain communication as being important for glucose homeostasis. Previous studies had focused on other functions of the kidney modulated by or modulating brain function. This work is likely to promote interest in CNS pathways that respond to afferent renal signals and the response of the HPA axis to glucosuria. Additional strengths of this paper stem from the use of incisive techniques. Specifically, the authors use isotope enabled measurement of endogenous glucose production by GC-MS/MS, capsaicin ablation of afferent renal nerves, and multifiber recording from the renal nerve. The authors also paid excellent attention to rigor in the design and performance of these studies. For example, they used appropriate surgical controls, confirmed denervation through renal pelvic CGRP measurement, and avoided the confounding effects of nerve regrowth over time. These factors strengthen confidence in their results. Finally, humans with glucose transporter mutations and those being treated with SGLT2 inhibitors show a compensatory increase in endogenous glucose production. Therefore, this study strengthens the case for using renal Glut2 knockout mice as a model for understanding the physiology of these patients.

      Weaknesses:

      A few weaknesses exist. Most concerns relate to the interpretation of this study's findings. The authors state that loss of glucose in urine is sensed as a biological threat based on the HPA axis activation seen in this mouse model. This interpretation is understandable but speculative. Importantly, whether stress hormones mediate the increase in endogenous glucose production in this model and in humans with altered glucose transporter function remains to be demonstrated conclusively. For example, the paper found several other circulating and local factors that could be causal. This model is also unable to shed light on how elevated stress hormones might interact with insulin resistance, which is known to increase endogenous glucose production. That issue is of substantial clinical relevance for patients with T2D and metabolic disease. Finally, how these findings can contribute to improving the efficiency of drugs like SGLT2 inhibitors remains to be seen.

      -  We agree with the reviewer’s overall assessment of this manuscript.

      - Confirming the contribution of each secreted protein shown in Fig. 4, whose levels were changed between the two groups of mice, toward causing a compensatory increase in glucose production in response to elevated glycosuria is beyond the scope of this manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors previously generated renal Glut2 knockout mice, which have high levels of glycosuria but normal fasting glucose. They use this as an opportunity to investigate how compensatory mechanisms are engaged in response to glycosuria. They show that renal and hepatic glucose production, but not metabolism, is elevated in renal Glut2 male mice. They show that renal Glut2 male mice have elevated Crh mRNA in the hypothalamus, and elevated plasma levels of ACTH and corticosterone. They also show that temporary denervation of renal nerves leads to a decrease in fasting and fed blood glucose levels in female renal Glut2 mice, but not control mice. Finally, they perform plasma proteomics in male mice to identify plasma proteins that are changed (up or down) between the knockouts and controls.

      Strengths:

      The question that is trying to be addressed is clinically important: enhancing glycosuria is a current treatment for diabetes, but is limited in efficacy because of compensatory increases in glucose production.

      Weaknesses:

      (1) Although I appreciate that the initial characterization of the mice in another publication showed that both males and females have glycosuria, this does not mean that both sexes have the same mechanisms giving rise to glycosuria. There are many examples of sex differences in HPA activation in response to threat, for example. There is an unfounded assumption here that males and females have the same underlying mechanisms of glycosuria that undermines the significance of the findings.

      - We agree with the reviewer that although we didn’t observe sex differences in renal Glut2 KO mice in the context of glucose homeostasis, their response (or mechanisms) to elevated glycosuria in enhancing compensatory glucose production may be different between the sexes. Therefore, we have included this limitation in discussion section.

      (2) The authors state that they induced the Glut2 knockout with taxomifen as in their previous publication. The methods of that publication indicate that all experiments were completed within 14 days of inducing the Glut2 knockout. This means that the last dose of tamoxifen was delivered 14 days prior to the experimental endpoint of each experiment. This seems like an important experimental constraint that should be discussed in this manuscript. Is the glycosuria that follows Glut2 knockout only a temporary change? If so, then the long-term change in glycosuria that follows SGLT2 inhibition in humans might not be best modelled by this knockout. Please specify when the surgeries to implant a jugular catheter or ablate the renal nerves performed relative to the Glut2 knockout in the Methods.

      - The reviewer’s statement ‘The methods of that publication indicate that all experiments were completed within 14 days of inducing the Glut2 knockout’ is incorrect. In the referred publication, we had explicitly mentioned in methods, ‘All of the experiments, except those using a diet-induced obesity mouse model or noted otherwise, were completed within 14 days of inducing the Glut2 deficiency.’ Please see figures 5h-l and 6 in the cited publication, which demonstrate that all the experiments were not completed within 14 days of inducing renal Glut2 deficiency. Per the reviewer’s advice, in the present manuscript we have include the timeline (which in some cases is 4 months beyond inducing glycosuria) in all the figure legends. In addition, for a separate project (which is unpublished) we have measured glycosuria up to 1 year after inducing renal Glut2 deficiency. Therefore, the glycosuria observed in the renal Glut2 KO mice is not temporary.

      (3) I am still unclear what group was used for controls. Are these wild-type mice who receive tamoxifen? Are they KspCadCreERT2;Glut2loxP/loxP mice who do not receive tamoxifen? This is important and needs to be specified.

      - In our previous response to the reviewer, we had already mentioned which control group was used in this study. Please see our response to the second reviewer’s point 3. As mentioned to the reviewer, we had used Glut2loxp/loxp mice as the control group, which is also described multiple times in the figure legends of our previous paper that reported the phenotype of renal Glut2 KO mice. Per the reviewer’s advice, we have provided the information again in a revised version of this manuscript.

      (4) The authors should report some additional control measures for the renal denervation that could also impact blood glucose and perhaps some of their other measures. The control measures, which one would like to see unimpacted by renal denervation, include body weights, food consumption and water intake, and glycosuria itself.

      - Please also see fig. 3 in the present manuscript that demonstrates renal afferent denervation doesn’t influence baseline blood glucose or plasma insulin levels. We have now also mentioned in the text that the denervation doesn’t affect food intake or bodyweight.

      (5) The graphical abstract shows a link between the hypothalamus and the liver that is completely unsupported by any of the current findings. That arrow should be removed.

      - Because we observed an increase in hepatic glucose production in renal Glut2 KO mice (Fig. 1) - which was reduced by 50% after selective afferent renal denervation (Fig. 3) - in the graphical abstract we are suggesting a neural connection between the kidney-brain-liver or an endocrine factor(s) to account for these changes in blood glucose levels as also described in the discussion section. We can include a question mark ‘?’ in the graphical abstract to show that further studies are need to validate these proposed mechanisms; however, we cannot just remove the arrow as advised by the reviewer.

      (6) Though the authors have toned down their language implying a causal link between the HPA measures and compensatory elevation of blood glucose in the face of glycosuria, the title still implies this causal link. It is still the case that their data do not support causation. There are many potential ways to establish a causal link but those experiments are not performed here. The renal afferents are correlated with Crh content of the PVN, but nothing has been done to show that the Crh content is important for elevating blood glucose. In light of this, the title should be toned down. Perhaps something like "Renal nerves maintain blood glucose production and elevated HPA activity in response to glycosuria". The link between HPA and glucose is not shown in this paper.

      - We request the reviewer to take a look at figure 1, showing an increase in glucose production in renal Glut2 KO mice and figure 3, which demonstrates that an afferent renal denervation reduces blood glucose levels by 50%. The afferent renal denervation (ablation of afferent renal nerves) does reduce blood glucose levels in renal Glut2 KO mice. Therefore, the use of the word ‘promote’ in the title is accurate and appropriate to reflect the role of the afferent renal nerves in contributing to about 50% increase in blood glucose levels in renal Glut2 KO mice.

      - Regarding the reviewer's comment on changes in Crh gene expression, please look at figure 3. Ablation of renal afferent nerves decreases hypothalamic Crh gene expression and other mediators of the HPA axis by 50%. Therefore, the afferent renal nerves do contribute to regulating blood glucose levels, at least in part, by the HPA axis (which is widely known to change blood glucose levels). The use of words such as ‘required’ or ‘necessary’ in the title may have indicated causal role or could have been misleading here; therefore, we have purposely used ‘promote’ in the title to accurately reflect the findings of this study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have only minor text corrections to add:

      - line 223 "A list"

      - line 253 "independent"

      - line 271 "the body's"

      - line 304 "do not"

      Yes, we have corrected these errors in a revised version of this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please report the dilutions used, if any, for the ELISAs. If the samples were run neat, please report this. Many manufacturer's instructions say that the user must determine the correct dilution to use for the samples collected. Also, sometimes when small blood volumes are collected, samples must be diluted to achieve the minimum volume collected for the assay. It is not sufficient to indicate that a reader refers to the manufacturer's instructions.

      - Per the reviewer’s advice, we have included the dilutions used for each assay in the methods.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Point 1: The authors have demonstrated that Cs9g12620 contains the EBE of PthA4 in the promoter region, to show that PthA4 controls Cs9g12620, the authors need to compare to the wild type Xcc and pthA4 mutant for Cs9g12620 expression. The data in Figure 1 is not enough.

      The data in Figure 1 D and E show a pthA4 Tn5 insertion mutant Mxac126-80 and the expression level of Cs9g12620 in citrus inoculated with the pthA4 mutant.

      Point 2: The authors confirmed the interaction between PthA4 and the EBE in the promoter of Cs9g12620 using DNA electrophoretic mobility shift assay (EMSA). However, Figure 2B is not convincing. The lane without GST-PthA4 also clearly showed a mobility shift. For the EMSA assay, the authors need also to include a non-labeled probe as a competitor to verify the specificity. The description of the EMSA in this paper suggests that it was not done properly. It is suggested the authors redo this EMSA assay following the protocol: Electrophoretic mobility shift assay (EMSA) for detecting protein-nucleic acid interactions PMID: 17703195.

      Thank you very much for your comments. We have re-conducted the EMSA analysis based on your suggestion. The DNA probe was labeled with Cy5, included a non-labeled probe as a competitor. (Figure 3 B and D; Figure 4B and E)

      Point 3: The authors also claimed that PthA4 suppresses the promote activity of Cs9g12620. The data is not convincing and also contradicts with their own data that overexpression of Cs9g12620 causes canker and silencing of it reduces canker considering PthA4 is required for canker development. The authors conducted the assays using transient expression of PthA4. It should be done with Xcc wild type, pthA4 mutant, and negative control to inoculate citrus plants to check the expression of Cs9g12620.

      We have detected Cs9g12620 expression in silencing citrus plants inoculated wild type Xcc 29-1. (Figure 7F)

      Point 4: Figure 6 AB is not convincing. There are no apparent differences. The variations shown in B are common in different wild-type samples. It is suggested that the authors conduct transgenic instead of transient overexpression.

      It has been proven that transient expression of PthA4 leads to canker-like phenotype, suggesting that this experiment is effective. However, it will be more confident if conduct transgenic plant overexpressing pthA4 and Cs9g12620. We’ll create the plants in our following research to confirm the phenotype.

      Point 5: Gene silencing data needs more appropriate controls. Figure D seems to suggest canker symptoms actually happen for the RNAi treated. The authors need to make sure the same amount of Xcc was used for both CTV empty vector and the RNAi. It is suggested a blink test is needed here.

      We used the same amount of Xcc to inoculate CTV empty vector and the RNAi. In either inoculation, the cultured Xcc cells were suspended in sterile distilled water to a final concentration of 108 CFU/mL (OD600 = 0.3).

      Point 6: Figure 1. Please draw a figure to clearly show the location of the EBE in the promoter of Cs9g12620, including the transcription start site, and translational start site.

      The EBE in Cs9g12620 promoter was indicated by underlined in Figure supplement 1. We did not sure about the translation start site, but the translation start site was labelled.

    1. Author response:

      Reviewer #1 (Public Review):

      Areas of improvement and suggestions:

      (1) "These results suggest the SP targets interneurons in the brain that feed into higher processing centers from different entry points likely representing different sensory input" and "All together, these data suggest that the abdominal ganglion harbors several distinct type of neurons involved in directing PMRs"

      The characterization of the post-mating circuitry has been largely described by the group of Barry Dickson and other labs. I suggest ruling out a potential effect of mSP in any of the well-known post-mating neuronal circuitry, i.e: SPSN, SAG, pC1, vpoDN or OviDNs neurons. A combination of available split-Gal4 should be sufficient to prove this.

      Indeed, we have tested drivers for some of these neurons already and agree that this information is important to distinguish neurons which are direct SP target from neurons which are involved in directing reproductive behaviors.

      (2) Authors must show how specific is their "head" (elav/otd-flp) and "trunk" (elav/tsh) expression of mSP by showing images of the same constructs driving GFP.

      The expression pattern for tshGAL, which expresses in the trunk is already published (Soller et al., 2006). We will add images for “head” expression.

      (3) VT3280 is termed as a SAG driver. However, VT3280 is a SPSN specific driver (Feng et al., 2014; Jang et al., 2017; Scheunemann et al., 2019; Laturney et al., 2023). The authors should clarify this.

      According to the reviewers suggestion, we will clarify the specificity of VT3280.

      (4) Intersectional approaches must rule out the influence of SP on sex-peptide sensing neurons (SPSN) in the ovary by combining their constructs with SPSN-Gal80 construct. In line with this, most of their lines targets the SAG circuit (4I, J and K). Again, here they need to rule out the involvement of SPSN in their receptivity/egg laying phenotypes. Especially because "In the female genital tract, these split-Gal4 combinations show expression in genital tract neurons with innervations running along oviduct and uterine walls (Figures S3A-S3E)".

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      In principal, use of GAL80 is a valid approach to restrict expression, if levels of GAL80 are higher than those of GAL4, because GAL80 binds GAL4 to inhibit its activity. Hence, if levels of GAL80 are lower, results could be difficult to interpret.

      (5) The authors separate head (brain) from trunk (VNC) responses, but they don't narrow down the neural circuits involved on each response. A detailed characterization of the involved circuits especially in the case of the VNC is needed to (a) show that the intersectional approach is indeed labelling distinct subtypes and (b) how these distinct neurons influence oviposition.

      Again, we agree with this reviewer that we need a higher resolution of expression to only one cell type. However, this is a major task that we will continue in follow up studies.

      Reviewer #2 (Public Review):

      Strength:

      The intersectional approach is appropriate and state-of-the art. The analysis is a very comprehensive tour-de-force and experiments are carefully performed to a high standard. The authors also produced a useful new transgenic line (UAS-FRTstopFRT mSP). The finding that neurons in the brain (head) mediate the SP effect on receptivity, while neurons in the abdomen and thorax (ventral nerve cord or peripheral neurons) mediate the SP effect on oviposition, is a significant step forward in the endavour to identify the underlying neuronal networks and hence a mechanistic understanding of SP action. Though this result is not entirely unexpected, it is novel as it was not shown before.

      We thank reviewer 2 for recognizing the advance of our work.

      Weakness:

      Though the analysis identifies a small set of neurons underlying SP responses, it does not go the last step to individually identify at least a few of them. The last paragraph in the discussion rightfully speculates about the neurochemical identity of some of the intersection neurons (e.g. dopaminergic P1 neurons, NPF neurons). At least these suggested identities could have been confirmed by straight-forward immunostainings agains NPF or TH, for which antisera are available. Moreover, specific GAL4 lines for NPF or P1 or at least TH neurons are available which could be used to express mSP to test whether SP activation of those neurons is sufficient to trigger the SP effect.

      We appreciate this reviewers recognition of our previous work showing that receptivity and oviposition are separable. As pointed out we have now gone one step further and identified in a tour de force approach subsets of neurons in the brain and VNC.

      We agree with this reviewer that we need a higher resolution of expression to only one cell type. As pointed out by this reviewer, the neurochemical identity is an excellent suggestions and will help to further restrict expression to just one type of neuron. However, this is a major task that we will continue in follow up studies.

      Reviewer #3 (Public Review):

      Strengths:

      Besides the main results described in the summary above, the authors discovered the following:

      (1) Reduction of receptivity and induction of egg-laying are separable by restricting the expression of membrane-tethered SP (mSP): head-specific expression of mSP induces reduction of receptivity only, whereas trunk-specific expression of mSP induces oviposition only. Also, they identified a GAL4 line (SPR12) that induced egg laying but did not reduce receptivity.

      (2) Expression of mSP in the genital tract sensory neurons does not induce PMR. The authors identified three GAL4 drivers (SPR3, SPR 21, and fru9), which robustly expressed mSP in genital tract sensory neurons but did not induce PMRs. Also, SPR12 does not express in genital tract neurons but induces egg laying by expressing mSP.

      We thank reviewer 3 for recognizing these two important points regarding the SP response that point to a revised model for how the underlying circuitry induces the post-mating response.

      Weaknesses:

      (1) Intersectional expression involving ppk-GAL4-DBD was negative in all GAL4AD lines (Supp. Fig.S5). As the authors mentioned, ppk neurons may not intersect with SPR, fru, dsx, and FD6 neurons in inducing PMRs by mSP. However, since there was no PMR induction and no GAL4 expression at all in any combination with GAL4-AD lines used in this study, I would like to have a positive control, where intersectional expression of mSP in ppk-GAL4-DBD and other GAL4-AD lines (e.g., ppk-GAL4-AD) would induce PMR.

      We will add positive controls of for ppk-DBD expression and expand the discussion section.

      (2) The results of SPR RNAi knock-down experiments are inconclusive (Figure 5). SPR RNAi cancelled the PMR in dsx ∩ fru11/12 and partially in SPR8 ∩ fru 11/12 neurons. SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive; it is unclear whether SPR mediates the phenotype in SPR8 ∩ fru 11/12 and dsx ∩ SPR8 neurons.

      We agree with this reviewer that the interpretation of the SPR RNAi results are complicated by the fact that SP has additional receptors (Haussmann et al 2013). The results are conclusive for all three intersections when expressing UAS mSP in SPR RNAi with respect to oviposition, e.g. egg laying is not induced in the absence of SPR. For receptivity, the results are conclusive for dsx ∩ fru11/12 and partially for SPR8 ∩ fru 11/12.

      Potentially, SPR RNAi knock-down does not sufficiently reduce SPR levels to completely reduce receptivity in some intersection patterns, likely also because splitGal4 expression is less efficient.

      Why SPR RNAi in dsx ∩ SPR8 neurons turned virgin females unreceptive is unclear, but we anticipate that we need a higher resolution of expression to only one cell type to resolve this unexpected result. However, this is a major task that we will continue in follow up studies.

      SPR RNAi knock-down experiments may also help clarify whether mSP worked autocrine or juxtacrine to induce PMR. mSP may produce juxtacrine signaling, which is cell non-autonomous.

      Whether membrane-tethered SP induces the response in a autocrine manner is an import aspect in the interpretation of the results from mSP expression.

      Removing SPR by SPR RNAi and expression of mSP in the same neurons did not induce egg laying for all three intersection and did not reduce receptivity for dsx ∩ fru11/12 and for SPR8 ∩ fru 11/12. Accordingly, we can conclude that for these neurons the response is induced in an autocrine manner.

      We will add this aspect to the discussion section.

    1. Author response:

      The following is the authors’ response to the current reviews.

      eLife assessment:

      The statistical analyses are incomplete.

      I find that the eLife assessment mentions “statistical analyses are incomplete” while the reviewers mention that the data are compelling and results are technically solid. Besides all observations in the manuscript are presented with robust and established norms of statistical analysis.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths:

      The use of data from before COVID-19 is both a strength and a weakness. Because COVID had effects on vascular health and had higher death rates for groups with the comorbidities of interest here, it has likely shifted the demographics in ways that would shift the results in unpredictable ways if the analysis were repeated with current data. This can be a strength in providing a reference point for studying those changes as well as allowing researchers to study differences between regions without the complication of different public health responses adding extra variation to the data. On the other hand, it limits the usefulness of the data in research concerned with the current status of the various populations.

      We completely agree with the observation, but were restricted as the purpose was to use the most robust and technically qualified data from GBD. The post COVID19 GBD data has not yet been released, but I am sure the observations made in the study can help in guiding the issues in the post COVID era too, because genetics is not going to change in these population groups.

      However, we did highlight this aspect of COVID19 even in our original version and also in the revised version.

      Reviewer #2 (Public Review):

      Weaknesses:

      The presentation is not focused. It would be better to include p-values and focus presentation on the main effects from the dataset analysis.

      The significant p-values were restricted to public health data only to identify and distinguish differences in incidence, prevalence and mortality and how they differ across world populations. These differences have often been interpreted from socio-economic point of view, while our manuscript presents the reasons for differences for main condition (Stroke) and its comorbid condition among different ethnicities from a genetic perspective. This genetic perspective was further explored to identify unique ethnic specific variants and their patterns of linkage disequilibrium in distinguishing the phenotypic variations. Considering the quantum and diversity of data, both for public health and GWAS data, there can be several directions but for presentation we focused only on the most distinguishing and established phenotypic differences. I am sure this will open up avenues for several future investigations including COVID, as has been highlighted by the reviewers too. All observations were presented with robust and established norms of statistical analysis.


      The following is the authors’ response to the original reviews.

      Thanks for the constructive observations on strengths and weaknesses of our manuscript. Interestingly, some of the weaknesses mentioned here also turns out to be the strength of the article. For example COVID19 has been mentioned by the reviewer as a driver to increase the mortality in some comorbid conditions and stroke. Firstly, I must clarify that, our data is from PreCOVID era and we indeed mention that in COVID era, COVID-19 might differentially impact the risk of stroke. Possibly this differential influence on the comorbidities of stroke, is likely to be influenced by its underlying genetics of stroke and its comorbidities.

      I have tried to address the concerns raised by the reviewers, which ideally doesn’t impact the original manuscript. Statistical limitation has been commented pertaining to P-values, which has been clarified here. However, certain minor concerns such as abbreviations have been resolved in the revised manuscript. My response to weakness and reviewer’s comments are mentioned below.

      Reviewer #1 (Public Review):

      Strengths:

      The data provided here will provide a foundation for a lot of future research into the causes of the observed correlations as well as whether the observed differences in comorbidities across regions have clinically relevant effects on risk management.

      Weaknesses:

      • As with any cross-national analysis of rates, the data is vulnerable to differences in classification and reporting across jurisdictions.

      GBD data is the most robust and most comprehensive data resource which has been used and accepted globally in predicting the health metrics statistics.

      GBD data indeed considers normalisations, regarding classification and reporting.

      To the best of our knowledge this is the best available resource to consider all health metrics analysis.

      • Furthermore, given the increased death rate from COVID-19 associated with many of these comorbid conditions and the long-term effects of COVID-19 infection on vascular health, it is expected that many of the correlations observed in this dataset will shift along with the shifting health of the underlying populations.

      I must clarify that we have used data prior to COVID-19.

      But yes the patterns after COVID19 will shift due to the impact of covid. This makes the study even more relevant as the comorbid conditions of stroke are also the risk drivers for COVID19 and mortality. This shift has been reported by some authors, which has been discussed in the discussion.

      Therefore, understanding the genetic factors underlying stroke and its comorbid conditions might help in resolving how COVID19 might differentially impact on health outcome.

      We did highlight this aspect of COVID19 even in our original version.

      Introduction 1st para:

      “It is the accumulated risk of comorbid conditions that enhances the risk of stroke further. Are these comorbid conditions differentially impacted by socio-economic factors and ethnogeographic factors. This was clearly evident in COVID era, when COVID-19 differentially impacted the risk of stroke, possibly due to its differential influence on the comorbidities of stroke.”

      Discussion 3rd para:

      “Studies reported reduction in life expectancy in 31 of 37 high-income countries, deduced to be due to COVID-191 . However, it would be unfair to ignore the comorbid conditions which could also be the critical determinants for reduced life expectancy in these countries.”

      Recommendations For The Authors:

      On page 5, the authors make a note about Africa and the Middle East having the highest ASMR for high SBP and comment about the relative populations of these regions. The populations of the regions are irrelevant to the rate.

      Since the study is on comorbid factors of stroke and its impact on mortality therefore, relative burden seems critical. This has been further elaborated here to justify the comment, which indeed is an integral part of the original manuscript.

      Paragraph referred – Results section 2:

      “Ethno-regional differences in mortality and prevalence of stroke and its major comorbid conditions

      We observed interesting patterns of ASMRs of stroke, its subtypes and its major comorbidities across different regions over the years as shown in figure 1a, table 1 and supplementary files S2 & S3. When assessed in terms of ranks, high SBP is the most fatal condition followed by IHD in all regions, except Oceania (OCE) where IHD and high SBP swap ranks. Africa (AFR; 206.2/100000, 95%UI 177.4-234.2) and Middle East (MDE; 198.6/100000, 95%UI 162.8-234.4) have the highest ASMR for high SBP, even though they rank as only the third and sixth most populous continents (fig. S2), respectively.”

      On page 17, the authors are alarmed by a large ratio between prevalence rates and mortality rates for certain conditions. This is confusing since this indicates that these conditions are not as dangerous as the other conditions.

      This has been further elaborated here to justify the comment, which indeed is an integral part of the original manuscript.

      Paragraph referred – Discussion para 1:

      “While the global stroke prevalence is nearly 15 times its mortality rate, prevalence of comorbid conditions such as high SBP, high BMI, CKD, T2D are alarmingly 150- to 500-fold higher than their mortality rates. These comorbid conditions can drastically affect the outcome of stroke.”

      In Figure 4, the colors are not defined.

      In Structure plot colours are assigned as per each K, it doesn’t directly refer to any population. But the plot distinguishes the stratification of populations as per K. Ramasamy, R.K., Ramasamy, S., Bindroo, B.B. et al. STRUCTURE PLOT: a program for drawing elegant STRUCTURE bar plots in user friendly interface. SpringerPlus 3, 431 (2014). https://doi.org/10.1186/2193-1801-3-431

      Reviewer #2 (Public Review):

      Strengths:

      The idea is interesting and the data are compelling. The results are technically solid.

      The authors identify specific genetic loci that increase the risk of a stroke and how they differ by region.

      Weaknesses:

      The presentation is not focused. It would be better to include p-values and focus presentation on the main effects of the dataset analysis.

      I presume the comment is made with reference to results with significant p-values.

      P-values are mentioned in the main text when referring to significant decrease or increase with respect to global rates and time e.g. P-values for comparison of a year 2019, are based on regional rates to global rates of 2019. Supplementary table S2a (mortality) and S3a (prevalence) e.g. P-values for comparison between year is based on 2019 rates to 2009 rates in Supplementary table S2b (mortality) and S3b (prevalence) e.g. P-values for proportional mortality and proportional prevalence in Supplementary table S4 and S5 is also based on global rates.

      Recommendations For The Authors:

      It would be better to minimize the use of acronyms. Often one has to go back to decipher what the acronym stands for. It is fine to have acronyms in figure legends, if necessary. However, at least for regions, please do not use acronyms.

      In the revised version we have tried to minimise the Acronyms.

      Removed the acronyms for regions and other places wherever possible however, the diseases acronyms have been maintained as per the GBD terms.

      Please focus the presentation on the main results. Currently, the presentation wanders and repeats itself a lot.

      Since the manuscript tries to address the global and regional rates of prevalence, mortality and its relationship to genetic correlates, it is difficult not to repeat the same to stress the significant observations coming out of different analysis methods. This might reflect on some amount of repetitiveness but the intention was to stress the significant observations.

      I would also recommend acknowledging and discussing socioeconomic factors earlier in the manuscript.

      Current mention happens in 3rd para of Discussion

      “The changing dynamics of stroke or its comorbid conditions can be attributed to multitude of factors. Often global burden of stroke has been discussed from the point of view of socio-economic parameters. Studies indicate that half of the stroke-related deaths are attributable to poor management of modifiable risk factors 8,9. However, we observe that different socio-economic regions are driven by different risk factors.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This paper by Beath et. al. identifies a potential regulatory role for proteins involved in cytoplasmic streaming and maintaining the grouping of paternal organelles: holding sperm contents in the fertilized embryos away from the oocyte meiotic spindle so that they don't get ejected into the polar body during meiotic chromosome segregation. The authors show that by time-lapse video, paternal mitochondria (used as a readout for sperm and its genome) is excluded from yolk granules and maternal mitochondria, even when moving long distances by cytoplasmic streaming. To understand how this exclusion is accomplished, they first show that it is independent of both internal packing and the engulfment of the paternal chromosomes by maternal endoplasmic reticulum creating an impermeable barrier. They then test whether the control of cytoplasmic steaming affects this exclusion by knocking down two microtubule motors, Katanin and kinesis I. They find that the ER ring, which is used as a proxy for paternal chromosomes, undergoes extensive displacement with these treatments during anaphase I and interacts with the meiotic spindle, supporting their hypothesis that the exclusion of paternal chromosomes is regulated by cytoplasmic streaming. Next, they test whether a regulator of maternal ER organization, ATX-2, disrupts sperm organization so that they can combine the double depletion of ATX-2 and KLP-7, presumably because klp-7 RNAi (unlike mei-1 RNAi) does not affect polar body extrusion and they can report on what happens to paternal chromosomes. They find that the knockdown of both ATX-2 and KLP-7 produces a higher incidence of what appears to be the capture of paternal chromosomes by the meiotic spindle (5/24 vs 1/25). However, this capture event appears to halt the cell cycle, preventing the authors from directly observing whether this would result in the paternal chromosomes being ejected into the polar body. 

      Strengths: 

      This is a useful, descriptive paper that highlights a potential challenge for embryos during fertilization: when fertilization results in the resumption of meiotic divisions, how are the paternal and maternal genomes kept apart so that the maternal genome can undergo chromosome segregation and polar body extrusion without endangering the paternal genome? In general, the experiments are well-executed and analyzed. In particular, the authors' use of multiple ways to knock down ATX-2 shows rigor. 

      Weaknesses: 

      The paper makes a case that this regulation may be important but the authors should do some additional work to make this case more convincing and accessible for those outside the field. In particular, some of the figures could include greater detail to support their conclusions, they could explain the rationale for some experiments better and they could perform some additional control experiments with their double depletion experiments to better support their interpretations. Also, the authors' inability to assess the functional biological consequences of the capture of the sperm genome by the oocyte spindle should be discussed, particularly in light of the cell cycle arrest that they observe. 

      These general comments are addressed in the more specific critiques below.

      Reviewer #2 (Public Review): 

      Summary 

      In this manuscript, Beath et al. use primarily C. elegans zygotes to test the overarching hypothesis that cytoplasmic mechanisms exit to prevent interaction between paternal chromosomes and the meiotic spindle, which are present in a shared zygotic cytoplasm after fertilization. Previous work, much of which by this group, had characterized cytoplasmic streaming in the zygote and the behavior of paternal components shortly after fertilization, primarily the clustering of paternal mitochondria and membranous organelles around the paternal chromosomes. This work set out to identify the molecular mechanisms responsible for that clustering and test the specific hypothesis that the "paternal cloud" helps prevent the association of paternal chromosomes with the meiotic spindle. 

      Strengths 

      This work is a collection of technical achievements. The data are primarily 3- and 4-channel time-lapse images of zygotes shortly after fertilization, which were performed inside intact animals. There are many instances in which the experiments show extreme technical skill, such as tracking the paternal chromosomes over large displacements throughout the volume of the embryo. The authors employ a wide variety of fluorescent reporters to provide a remarkably clear picture of what is going on in the zygote. These reagents and the novel characterization of these stages that they provide will be widely beneficial to the community. 

      The data provide direct visualization of what had previously been a mostly hypothetical structure, the "paternal cloud," using simultaneous labeling of paternal DNA and mitochondria in combination with a variety of maternal proteins including maternal mitochondria, yolk granules, tubulin, and plasma membrane. Together, these images provided convincing evidence of the existence of this specified cytoplasmic domain. They go on to show that the knockdown of the ataxin-2 homolog ALX-2, a protein previously shown to affect ER dynamics, disrupted the paternal cloud, identifying a role for ER organization in this structure. 

      The authors then used the system to test the functional consequences of perturbing the cytoplasmic organization. Consistent with the paternal cloud being a stable structure, it stayed intact during large movements the authors generated using previously published knockdowns (of mei-1/katanin and kinesin-13/kpl-7) that increased cytoplasmic streaming. They used this data to document instances in which the paternal chromosomes were likely to have been attached to the spindle. They concluded with direct evidence of spindle fibers connecting to the paternal chromatin upon knockdown of ATX-2 in combination with increased cytoplasmic streaming, providing strong, direct support for their overarching hypothesis. 

      Weaknesses 

      While the data is convincing, the narrative of the paper could be streamlined to highlight the novelty of the experiments and better articulate the aims. For example, the cloud of paternal mitochondria and membranous organelles was previously shown, but Figures 1-2 largely reiterate that observation. The innovation seems to be that the combination of ER, yolk, and maternal mitochondrial markers makes the existence of a specified domain more concrete. There are also some instances where more description is needed to make the conclusions from the images clear. 

      These general comments are addressed in the more specific critiques below.

      The manuscript intersperses what read like basic characterizations of fluorescent markers that, as written, can distract from the main story. The authors characterized the dynamics of ER organization throughout the substages of meiosis and the permeability of the envelope of ER that surrounds the paternal chromatin, but it could be more clearly established how the ability to visualize these structures allowed them to address their aims.

      We have added the following after the initial description of ER morphology changes: (ER morphology was used to determine cell-cycle stages during live imaging reported below in Fig. 6.)

      More background on what was previously known about ER organization in M-phase and the role of ataxin proteins specifically may help provide more continuity. 

      We have added references to transitions to ER sheets during mitotic M-phase in HeLa cells and Xenopus extracts.

      Reviewer #3 (Public Review): 

      Summary: 

      This study by Beath et al. investigated the mechanisms by which sperm DNA is excluded from the meiotic spindle after fertilization. Time-lapse imaging revealed that sperm DNA is surrounded by paternal mitochondria and maternal ER that is permeable to proteins. By increasing cytoplasmic streaming using kinesin-13 or katanin RNAi, the authors demonstrated that limiting cytoplasmic streaming in the embryo is an important step that prevents the capture of sperm DNA by the oocyte meiotic spindle. Further experiments showed that the Ataxin-2 protein is required to hold paternal mitochondria together and close to the sperm DNA. Finally, double depletion of kinesin-13 and Ataxin-2 suggested an increased risk of meiotic spindle capture of sperm DNA. 

      Overall, this is an interesting finding that could provide a new understanding of how meiotic spindle capture of sperm DNA and its accidental expulsion into the polar body is prevented. However, some conceptual gaps need to be addressed and further experiments and improved data analyses would strengthen the paper. 

      - It would be helpful if the authors could discuss in good detail how they think maternal ER surrounds the sperm DNA

      We have added 2 references to papers about nuclear envelope re-assembly from Shirin Bahmanyar’s lab and suggest the ER envelope is a halted intermediate in nuclear envelope reassembly.

      and why is it not disrupted following Ataxin disruption. 

      We have been attempting to disrupt ER structures in the meiotic embryo for the last 5 years by depleting profilin, BiP, atlastin, ATX-2 and by optogenetically packing ER into a ball in the middle of the oocyte.  None of these treatments prevent envelopment of the sperm DNA by maternal ER.  None of these treatments remove ER from the spindle envelope and none remove ER from the plasma membrane.  These treatments mostly result in “large aggregates” of ER that we have not examined by EM.  Wild speculation: any disruption of the ER strong enough to prevent ER envelopment around chromatin would be sterile because the M to S transition in the mitotic zone of the germline would be blocked.  Rapid depletion of ATX-2 to the extent shown by rigorous data in this manuscript does not prevent ER envelopment around chromatin.  We chose not to speculate about the reasons for this because we do not know why.

      - Since important phenotypes revealed in RNAi experiments (e.g. kinesin-13 and ataxin-2 double depletion) are not very robust, the authors should consider toning down their conclusions and revising some of their section headings. I appreciate that they are upfront about some limitations, but they do nonetheless make strong concluding sentences. 

      We have changed the discussion of the klp-7 atx-2 double depletion to: “The capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos suggests that the integrity of the exclusion zone around the sperm DNA might insulate the sperm DNA from spindle microtubules.  However, a much larger number of klp-7(RNAi) singly depleted and atx-2(degron) singly depleted time-lapse sequences are needed to rigorously support this idea. “

      - The discussion section could be improved further to present the authors' findings in the larger context of current knowledge in the field. 

      We have expanded the discussion as suggested.

      - The authors previously demonstrated that F-actin prevents meiotic spindle capture of sperm DNA in this system. However, the current manuscript does not discuss how the katanin, kinesin-13 and Ataxin-2 mechanisms could work together with previously established functions of F-actin in this process. 

      We have added pfn-1(RNAi) to the discussion section.

      - How can the authors exclude off-target effects in their RNAi depletion experiments? Can kinesin-13, katanin, and Ataxin phenotypes be rescued for instance? 

      For ataxin-2 phenotypes, two completely independent controls for off target effects are shown.  GFP(RNAi) on a strain with and endogenous ATX-2::GFP tag vs GFP(RNAi) on a strain with no tag on the ATX-2.  ATX-2::AID with or without auxin.  For kinesin-13 and katanin, we did not do a rigorous control for off-target effects of RNAi.  However, the effects of these depletions on cytoplasmic microtubules have been previously reported by others

      - How are the authors able to determine if the paternal genome was actually captured by the spindle? Does lack of movement definitively suggest capture without using a spindle marker? 

      mKate::tubulin labels the spindle in each capture event.  This can be seen in Video S3. for mei-1(RNAi) and Figure 9 for atx-2 klp-7 double depletions.

      (1) Major issues: 

      The images provided are not convincing that mitochondria are entirely excluded from the regions with yolk granules from the images provided. Please provide insets of magnified images of the paternal mitochondria in Figure 1E to more clearly show the exclusion even when paternal mitochondria are streaming. Providing grayscale images, individual z-sections and/or some quantification of this data might also be more convincing to this reviewer. 

      We have modified Fig. 1 by adding single wavelength magnified insets to more clearly show that paternal mitochondria are in a “black hole” in the maternal yolk granules during  cytoplasmic streaming.

      Figure 2 -This figure can be retitled to highlight that the paternal organelle cloud is impermeable to mitochondria and conserved. 

      The legend has been re-titled as suggested.

      Figure 3B, An image of the DNA within the ring of maternal ER especially since the maternal ER ring is used as a proxy for the paternal chromosomes in later figures would strengthen the authors' claims.

      We have added a panel showing DAPI-stained DNA in the center of the ER ring and paternal mitochondria cloud. 

      Why is the faster time scale imaging significant? I think this could be more clearly set up in the paper. Perhaps rapid imaging of maternal mito-labeled kca-1(RNAi) embryos would better show the difference in time scale, with the expectation that the paternal cloud forms and persists while the ER invades. 

      We are not sure what the reviewer means.  5 sec time intervals were used throughout the paper.  We are also not sure how kca-1(RNAi) would help.  Movement of the entire oocyte into and out of the spermatheca is what limits the ability to keep a fusing sperm in focus.  kca-1(RNAi) would prevent cytoplasmic streaming but not ovulation movements.

      Figure 4 - The question about the permeability of the ER envelope seems to come out of nowhere as written. It isn't clear how it contributes to the larger story about preventing sperm incorporation in the spindle.

      This section of the results is introduced with: “If the maternal ER envelope around sperm DNA was sealed and impermeable during meiosis, this could both prevent the sperm DNA from inducing ectopic spindle assembly and prevent the sperm DNA from interacting with meiotic spindle microtubules.” 

      The data in Figure 4 would probably not be expected to be in this paper based on the paper title. Maybe the title needs something about ER dynamics? "eg. ATX-2 but not an ER envelope" isolates the paternal chromatin? 

      In Figure 5, it seems that RNAi of klp-7 and Mei-1 had slightly different effects on short-axis displacement of the ER envelope (klp-7 affecting it more dramatically than mei-1) and slightly different effects on interaction with the meiotic spindle (capture vs streaming past the spindle). The authors mention in their discussion that the difference in the interaction with the meiotic spindle might reflect the effects that loss of Mei-1 may have on the spindle but could it also be a consequence of the differences in cytoplasmic streaming observed?

      With our current data, the only statistically significant difference between cytoplasmic streaming of the sperm contents in mei-1(RNAi) vs klp-7(RNAi) is that excessive streaming persists longer into metaphase II in klp-7(RNAi).  We have added a sentence describing this difference to the results.  If differences in streaming were the cause of different capture frequencies, then klp-7(RNAi) would cause more capture events than mei-1(RNAi) but the opposite was observed.  We have avoided too much discussion here because the frequency of capture events is too low to demonstrate statistically significant differences between mei-1(RNAi), klp-7(RNAi), and atx-2(degron) + klp-7(RNAi) without a very large increase in the number of time-lapse sequences.  

      Also, the authors should find a way to represent this interaction with the meiotic spindle in a quantitative or table form to allow the reader to observe some of the patterns they report more easily.

      We have added a table to Fig. 9 that summarizes capture data.

      Finally, can the authors report when they observe the closest association with the meiotic spindle: Does it correlate with the period of greatest displacement (AI) or are they unlinked? 

      The low frequency of capture events makes it difficult to test this rigorously.

      Figure 6- 'Endogenously tagged ATX-2 was observed throughout oocytes and meiotic embryos without partial co-localization with ER.' How can the authors exclude co-localization with ER? 

      We have changed the wording to: “Endogenously tagged ATX-2 was observed throughout oocytes and meiotic embryos (Fig. 6A; Fig. S2).  ATX-2 did not uniquely  co-localize with ER (Fig. S2).“

      The rationale for why the authors think that the integrity of sperm organelles is important to keep the genomes apart is not clear to this reviewer and needs to be explained better. Moving the discussion of the displacement experiments in Figure S3 from the end of the results section to the ATX-2 knockdown section would help accomplish this. 

      We have added the sentence: “The frequency of sperm capture by the meiotic spindle (Fig. 9D) was significantly higher than wild-type controls in klp-7(RNAi) atx-2(AID) double depleted embryos (p=0.011 Fisher’s exact test).   Although the number of single mutant embryos analyzed was too low to demonstrate a significant difference between single and double mutant embryos,  these results qualitatively support the hypothesis that limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria are both important for preventing capture events between the meiotic spindle and sperm DNA.”

      It looks like, in the double knockdown of ATX-2 and KLP-7, the spread of paternal mitochondria is less affected than when only ATX-2 is depleted. What effect does this result have on the observation that the incidence of sperm capture appears to increase in the double depletion? What does displacement of the ER ring look like in the double depletion? Is it additive, consistent with their interpretation that both limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria is required to keep the genomes separate? 

      We cannot show a significant difference between single a double knockdowns without increasing n by alot.  We did not analyze ER ring displacement in the double mutant.

      Is the increased incidence of capture in the double-depleted embryos significant? 

      We have added the sentence: “The frequency of sperm capture by the meiotic spindle (Fig. 9D) was significantly higher than wild-type controls in klp-7(RNAi) atx-2(AID) double depleted embryos (p=0.011 Fisher’s exact test).   Although the number of single mutant embryos analyzed was too low to demonstrate a significant difference between single and double mutant embryos,  these results qualitatively support the hypothesis that limiting cytoplasmic streaming and maintaining the integrity of the ball of paternal mitochondria are both important for preventing capture events between the meiotic spindle and sperm DNA.”

      What do the authors make of the cell cycle arrest observed when paternal chromosomes are captured? Is there an argument to be made that this arrest supports the idea that preventing this capture is actively regulated and therefore functionally important? 

      We chose not to discuss the mechanism of this arrest because considerably more work would be required to prove that it is not caused by a combination of imaging conditions and genotype.  The low frequency of these capture + arrest events would make it very difficult to show that the arrest does not occur after depleting a checkpoint protein.

      (2) Minor concerns: 

      Top of page 4: "streaming because depletion tubulin stops cytoplasmic streaming (7)" should be "streaming because depletion of tubulin stops cytoplasmic streaming (7)" 

      The ”of” has been inserted.

      Page 6: "This result indicated that the volume of paternal mitochondria excludes maternal mitochondria and yolk granules but not maternal ER." The authors have only shown this for maternal mitochondria, not yolk granules. 

      We have deleted the mention of yolk granules here.

      Page 7: "These results suggest that all maternal membranes are initially excluded from the sperm at fusion." Should be "These results show that maternal ER are initially excluded from the sperm at fusion. Since maternal mitochondria and yolk granules are excluded later, this suggests that all maternal membranes are initially excluded from the sperm at fusion." 

      We have changed this sentence as suggested.

      It's not clear why the authors show other types of movement that might be quantified when cytoplasmic streaming is affected in Figure 5A and only quantify long-axis and short-axis displacement. 

      We have deleted the other types of movement from the schematic.  Although these parameters were quantified, we did not include this data in the results so it would be confusing for the reader to have them in the schematic.

      Bottom of page 7: Mention that the GFP::BAF-1 was maternally provided. 

      We have added “Maternally provided..”

      Missing an Arrow on Figure 1A 9:20. 

      We removed the text citation to an arrow in Fig. 1A because we moved most of the description of the ER ring to Fig. 3 to address other reviewer suggestions.

      Supplemental videos should be labeled appropriately to indicate what structures are labeled. It is currently difficult to understand what is being shown. 

      (3) Issues with the Discussion section: 

      "The simplest explanation is that cytoplasm does not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm." - Citation page 12. 

      We have changed the sentence to: “The simplest hypothesis is that maternal and paternal cytoplasm might not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm.” 

      "The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubule" - Pages 12-13 reference the figures. 

      This sentence has been rewritten in response to other comments but the new sentence now references revised Fig. 9.

      "ATX-2 is required to maintain the integrity of the ball of paternal mitochondria around the sperm DNA, but the mechanism is unknown." - Page 13 reference figure. 

      A reference to Figs 7 and 8 has been inserted.

      " In control embryos, the sperm contents rarely came near the meiotic spindle in agreement with a previous study that found that male and female pronuclei rarely form next to each other (6). Streaming of the sperm contents was most commonly restricted to a jostling motion with little net displacement, circular streaming in the short axis of the embryo, or long axis streaming in which the sperm turned away from the spindle before the halfway point of the embryo. Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)." - Page 13, the corresponding figures need to be referenced for these sentences. 

      We have inserted figure references.

      "In capture events observed after double depletion of ATX-2 and KLP-7, a bundle of microtubules was discernible extending from the spindle into the ER envelope surrounding the sperm DNA. Such bundles were not observed in mei-1(RNAi) capture events, likely because of the previously reported low density of microtubules in mei-1(RNAi) spindles (36, 37)." - Pages 13-14 references figures here. 

      We have inserted figure references.

      "The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubules." - This should be toned down since this phenotype is not robust. 

      We have changed this to: “The capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos suggests that the integrity of the exclusion zone around the sperm DNA might insulate the sperm DNA from spindle microtubules.  However, a much larger number of klp-7(RNAi) singly depleted and atx-2(degron) singly depleted time-lapse sequences are needed to rigorously support this idea. “

      ATX-2 depletion alters ER morphology but does not impact the maternal ER envelope - could the authors provide a potential explanation for this? 

      In the discussion, we cite papers showing that ATX-2 depletion affects many different cellular processes so the effect we see on paternal mitochondria might have nothing to do with the ER ring.   We have been attempting to disrupt ER structures in the meiotic embryo for the last 5 years by depleting profilin, BiP, atlastin, ATX-2 and by optogenetically packing ER into a ball in the middle of the oocyte.  None of these treatments prevent envelopment of the sperm DNA by maternal ER.  None of these treatments remove ER from the spindle envelope and none remove ER from the plasma membrane.  These treatments mostly result in “large aggregates” of ER that we have not examined by EM.  Wild speculation: any disruption of the ER strong enough to prevent ER envelopment around chromatin would be sterile because the M to S transition in the mitotic zone of the germline would be blocked.  Rapid depletion of ATX-2 to the extent shown by rigorous data in this manuscript does not prevent ER envelopment around chromatin.  We chose not to speculate about the reasons for this because we do not know why.

      It would be good to have representative images of what the altered spindle looks like in MEI-1-depleted oocytes. 

      The structure of MEI-1-depleted spindles has been described in the cited references.

      "Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)" - It is intriguing that this does not happen in the double depletion experiments of kinesin-13 and ATX-2. The authors should perhaps discuss this. 

      This does happen in KLP-7 ATX-2 double depleted embryos as shown in Fig. 9.

      (4) Missing citations: 

      "This analysis was restricted to embryos from anaphase I through anaphase II because our streaming data and that of Kimura 2020 indicate that the sperm contents have not moved significantly before anaphase I." - This needs an appropriate citation. Page 10. 

      We have inserted citations here.

      " The simplest explanation is that cytoplasm does not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm." - Citation page 12. Not referencing figures in the discussion. 

      We have changed the sentence to: “The simplest hypothesis is that maternal and paternal cytoplasm might not mix during the 45 min from GVBD to pronucleus formation due to the high viscosity of cytoplasm.” 

      "The higher frequency of capture of the sperm DNA by the meiotic spindle in ATX-2 KLP-7 double depleted embryos compared with either single depletion suggests that the integrity of the exclusion zone around the sperm DNA may insulate the sperm DNA from spindle microtubule" - Pages 12-13 reference the figures. 

      A reference to the revised Fig. 9 has been inserted in the revised version of this sentence.

      "ATX-2 is required to maintain the integrity of the ball of paternal mitochondria around the sperm DNA, but the mechanism is unknown." 

      References to Figs. 7 and 8 have been inserted.

      Page 13 reference figure 

      " In control embryos, the sperm contents rarely came near the meiotic spindle in agreement with a previous study that found that male and female pronuclei rarely form next to each other (6). Streaming of the sperm contents was most commonly restricted to a jostling motion with little net displacement, circular streaming in the short axis of the embryo, or long axis streaming in which the sperm turned away from the spindle before the halfway point of the embryo. Depletion of MEI-1 or KLP-7 resulted in longer excursions of the sperm contents in the long axis of the embryo toward the spindle but frequent capture of the sperm by the spindle was only observed in mei-1(RNAi)." Page 13, the corresponding figures need to be referenced for these sentences. 

      We have inserted citations here.

      "In capture events observed after double depletion of ATX-2 and KLP-7, a bundle of microtubules was discernible extending from the spindle into the ER envelope surrounding the sperm DNA. Such bundles were not observed in mei-1(RNAi) capture events, likely because of the previously reported low density of microtubules in mei-1(RNAi) spindles (36, 37)." Pages 13-14 references figures here. 

      We have inserted citations here.

      (5) Referencing wrong figures in the text: 

      Figure 5 - In the figure legend there is a 5C but there is no 5C panel in the figure. 

      A C has been inserted in Fig. 5.

      Figure 6A - "Dark holes were observed suggesting exclusion from the lumens of larger membranous organelles (Fig. 6A; Fig. S2)." Page 10. 

      6A has been changed to 6C.

      Figure 6A is showing background autofluorescence in WT oocytes so I am not certain why it is cited here. 

      The Figure citation has been corrected to 6B, C.

      Figure 8 - I could not find the supplemental data file with the individual mitochondria distance measurements. 

      We are including the Excel file with the revised submission.

      The last sentence of the first paragraph should be re-worded to be more concise ". In C. elegans, the nucleus is positioned away from the site of future fertilization so that the meiosis I spindle assembles at the opposite end of the ellipsoid zygote from the site of fertilization (2-4). " 

      Every word of this sentence is important.

      Last sentence second paragraph typo "These microtubules are thought to drive meiotic cytoplasmic streaming because depletion tubulin stops cytoplasmic streaming (7) and depletion of the microtubule-severing protein katanin by RNAi results in an increased mass of cortical microtubules and an increase in cytoplasmic streaming (8)." Pages 3-4. 

      “of” has been inserted.

      (6) Typos in the introduction should be corrected: 

      Ataxin or kinesin-13 are not mentioned in the introduction but these are a big focus of the paper. 

      Gong et al 2024 written instead of number citation (page 5), no citation in References.

      This has been corrected. 

      Supplemental videos should be labeled appropriately to indicate what structures are labeled. It is currently difficult to understand what is being shown.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Summary:

      The authors used four datasets spanning 30 countries to examine funding success and research quality score for various disciplines. They examined whether funding or research quality score were influenced by majority gender of the discipline and whether these affected men, women, or both within each discipline. They found that disciplines dominated by women have lower funding success and research quality score than disciplines dominated by men. These findings, are surprising because even the men in women-dominated fields experienced lower funding success and research quality score.

      Strengths:

      - The authors utilized a comprehensive dataset covering 30 countries to explore the influence of the majority gender in academic disciplines on funding success and research quality scores.

      - Findings suggest a systemic issue where disciplines with a higher proportion of women have lower evaluations and funding success for all researchers, regardless of gender.

      - The manuscript is notable for its large sample size and the diverse international scope, enhancing the generalizability of the results.

      - The work accounts for various factors including age, number of research outputs, and bibliometric measures, strengthening the validity of the findings.

      - The manuscript raises important questions about unconscious bias in research evaluation and funding decisions, as evidenced by lower scores in women-dominated fields even for researchers that are men.

      - The study provides a nuanced view of gender bias, showing that it is not limited to individuals but extends to entire disciplines, impacting the perception and funding and quality or worth of research.

      - This work underscores the need to explore motivations behind gender distribution across fields, hinting at deep-rooted societal and institutional barriers.

      - The authors have opened a discussion on potential solutions to counter bias, like adjusting funding paylines or anonymizing applications, or other practical solutions.

      - While pointing out limitations such as the absence of data from major research-producing countries, the manuscript paves the way for future studies to examine whether its findings are universally applicable.

      Weaknesses:

      - The study does not provide data on the gender of grant reviewers or stakeholders, which could be critical for understanding potential unconscious bias in funding decisions. These data are likely not available; however, this could be discussed. Are grant reviewers in fields dominated by women more likely to be women?

      - There could be more exploration into whether the research quality score is influenced by inherent biases towards disciplines themselves, rather than only being gender bias.

      - The manuscript should discuss how non-binary gender identities were addressed in the research. There is an opportunity to understand the impact on this group.

      - A significant limitation is absence of data from other major research-producing countries like China and the United States, raising questions about the generalizability of the findings. How comparable are the findings observed to these other countries?

      - The motivations and barriers that drive gender distribution in various fields could be expanded on. Are fields striving to reach gender parity through hiring or other mechanisms?

      - The authors could consider if the size of funding awards correlates with research scores, potentially overlooking a significant factor in the evaluation of research quality. Presumably there is less data on smaller 'pilot' funds and startup funds for disciplines where these are more common. Would funding success follow the same trend for these types of funds?

      - The language used in the manuscript at times may perpetuate bias, particularly when discussing "lower quality disciplines," which could influence the reader's perception of certain fields.

      - The manuscript does not clarify how many gender identities were represented in the datasets or how gender identity was determined, potentially conflating gender identity with biological sex.

      Reviewer #3 (Public Review):

      This study seeks to investigate one aspect of disparity in academia: how gender balance in a discipline is valued in terms of evaluated research quality score and funding success. This is important in understanding disparities within academia.

      This study uses publicly available data to investigate covariation between gender balance in an academic discipline and:

      i) Individual research quality scores of New Zealand academics as evaluated by one of 14 broader subject panels.

      ii) Funding success in Australia, Canada, Europe, UK.

      The study would benefit from further discussion of it limitations, and from the clarification of some technical points (as described in the recommendations for the authors).

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      This is a very nice study as-is. In the following comments, I have mainly put my thoughts as I was reading the manuscript. If there are practical ways to answer my questions, I think they could improve the manuscript but the data required for this may not be available.

      Are there any data on the gender of grant reviewers or stakeholders who make funding decisions?

      The research quality score metrics seem to be more related to unconscious bias. The funding metrics may also, but there are potentially simple fixes (higher paylines for women or remove gender identities from applications).

      We have included some details about PBRF funding panel gender diversity. These panels are usually more gender balanced than the field they represent, but in the extreme cases (Engineering, Education, Mathematics) they are skewed as would be expected. Panels for other award decision makers was not available.

      I wonder if the research score metric isn't necessarily reflecting on the gender bias in the discipline but rather on the discipline itself? Terms like "hard science" and "soft science" are frequently used and may perpetuate these biases. This is somewhat supported by the data - on line 402-403 the authors state that women in male-dominated fields like Physics have the same expected score as a man. Could it be that Physics has a higher score than Education even if Physics was woman-dominated and Education was man-dominated? Are there any instances in the data where traditionally male- or female-dominated disciplines are outliers and happen to be the opposite? If so, in those cases, do the findings hold up?

      Overall we would love to answer this question! But our data is not enough. We mention these points in the Discussion (Lines 472-466). We have extended this a little to cover the questions raised here.

      How are those with non-binary gender identities handled in this article? If there is any data on the subject, I would be curious to know how this effects research score and funding success.

      These data were either unavailable or the sample size was too small to be considered anonymously (Mentioned on Lines 74-76).

      A limitation of the present article is a lack of data on major research-producing countries like China and the United States. Is there any data relevant to these or other countries? Is there reason to believe the findings outlined in this manuscript would apply or not apply to those countries also?

      We would be very excited to see if the findings held up in other countries, particularly any that were less European based. Unfortunately we could not find any data to include. Maybe one day!

      What are the motivations or other factors driving men to certain fields and women to certain fields over others? What are the active barriers preventing all fields from 50% gender parity?

      Field choice is a highly studied area and the explanations are myriad we have included a few references in the discussion section on job choice. I usually recommend my students read the blog post at

      https://www.scientificamerican.com/blog/hot-planet/the-people-who-could-have-done-science-didnt/

      It is very thoughtful but unfortunately not appropriate to reference here.

      The authors find very interesting data on funding rates. Have you considered funding rates and the size of funding awards as a factor in research score? Some disciplines like biomedical science receive larger grants than others like education.

      A very interesting thought for our next piece of work. We would definitely like to explore our hypothesis further.

      There are instances where the authors writing may perpetuate bias. If possible these should be avoided. One example is on line 458-459 where the authors state "...why these lower quality disciplines are more likely..." This could be re-written to emphasize that some disciplines are "perceived" as lower quality. Certainly those in these discipline would not characterize their chosen discipline as "low quality".

      Well-spotted! Now corrected as you suggest.

      Similar to the preceding comment, the authors should use care with the term "gender". In the datasets used, how many gender identities were captured? How many gender identity options were given in the surveys or data intake forms? Could individuals in these datasets have been misgendered? Do the data truly represent gender identity or biological sex?

      We know that in the PBRF dataset gender was a binary choice and transgender individuals were able to choose which group they identified with. There was no non-binary option (in defence the latest dataset there is from 2018 and NZ has only recently started updating official forms to be more inclusive) and individuals with gender not-stated (a very small number) were excluded. ARC did mention that a small number of individuals were either non-binary or gender not stated, again these are not included here for reasons of anonymity. This is now mentioned on Lines 74-76. The effects on this group are important and understudied likely because, as here, the numbers are too small to be included meaningfully.

      Reviewer #3 (Recommendations For The Authors):

      Major revisions:

      Could you add line numbers to the Supplementary Materials for the next submission?

      Yes! Sorry for the omission.

      (1) In the main text L146 and Figure 1, it is not clear why the expected model output line is for a 50 year old male from University of Canterbury only, but the data points are from disciplines in all eight universities in New Zealand. I think it would be more clear and informative to report the trend lines that represent the data points. At the moment it is hard to visualise how the results apply to other age groups or universities.

      As age and institution are linear variables with no interactions they are only a constant adjustment above or below this line and the adjustment is small in comparison to the linear trend. Unfortunately, if they were included graphically they do not aid understanding. We agree that indluded raw data with an adjusted trend line can be confusing buy after a lor of between-author discussion this was the most informative compromise we could find (many people like raw data so we included it).

      (2) Does your logistic regression model consider sample size weighting in pmen? Weighting according to sample sizes needs to be considered in your model. At the moment it is unclear and suggests a proportion between 0 and 1 only is used, with no weighting according to sample size. If using R, you can use glm(cbind(nFem, nMalFem).

      Yes. All data points were weighted by group size exactly as you suggest. We have updated the text on Lines 317 to make this clear.

      (3) For PBRF, I think it is useful to outline the 14 assessment panels and the disciplines they consider. Did you include the assessment panel as an explanatory variable in your model too to investigate whether quality is assessed in the same manner between panels? If not, then suggest reasons for not doing so.

      We have now included more detail in main text on the gender split of the panels. They were not included as an explanatory variable. In theory there was some cross-referencing of panel scores to ensure consistency as part of the PBRF quality assurance guidelines.

      (4) There are several limitations which should be discussed more openly:

      Patterns only represent the countries studied, not necessarily academia worldwide.

      Mentioned on Line 485-487.

      Gender is described as a binary variable.

      Discussed on Line 74-76.

      The measure of research evaluation as a reflection of academic merit.

      This is acknowledged in the data limitations paragraph in the discussion, at the end of the discussion

      Minor revisions:

      (1) L186. Why do you analyse bibliometric differences between individuals from University of Canterbury only? It would be helpful to outline your reasons.

      Although bibliometric data is publicly available it is difficult to collect for a large number of individuals. You also need some private data to match bibliometrics with PBRF data which is anonymous. We were only able to do this for our own institution with considerable internal support.

      (2) How many data records did you have to exclude in L191 because they could not be linked? This is helpful to know how efficient the process was, should anyone else like to conduct similar studies.

      We matched over 80% of available records (384 individuals). We have mentioned this on Line 194.

      (3) Check grammar in the sentence beginning in L202.

      Thank-you. Corrected.

      (4) Please provide a sample size gender breakdown for "University of Canterbury (UC) bibliometric data", as you do for the preceding section. A table format is helpful.

      Included on Line 194.

      (5) L377 I think this sentence needs revision.

      Thank you, we have reworked that paragraph.

      (6) L389-392 Is it possible evaluation panels can score women worse than men and that because more women are present in female-biassed disciplines, the research score in these are worse? Women scoring worse between fields, may be a result of some scaling to the mean score.

      No.  This is not possible because women in male-dominated fields score higher.

      (7) L393 Could you discuss explanations for why men outperform women in research evaluation scores more when disciplines are female dominated?

      Unfortunately, we don’t have an explanation for this and can’t get one from our data. We hope it will be an interesting for future work.

      (8) Could the figures be improved by having the crosses, x and + scaled, for example, in thickness corresponding to sample size? Alternatively, some description of the sample size variation? Sorting the rows by order of pmen in Table E1 would also be helpful for the reader.

      As with the previous figure we have tried many ways of presenting it (including tis one). Unfortunately nothing helped.

      We have provided Table E1 as a spreadsheet to allow readers to do this themselves.

      (9) Please state in your methods section the software used to aid repeatability.

      This is now in Supplementary Materials (Matlab 2022b).

      (10) It is great to report your model findings into real terms for PBRF and ARC. Please can you extend this to CIHR and EIGE. i.e. describing how a gender skew increase of x associates with a y increase in funding success chance.

      We have added similar explanations for both these datasets comparing the advantage of being male with the advantage of working in a male dominated discipline.

      (11) I would apply care to using pronouns "his" and "her" in L322-L324 and avoid if at all possible, instead, replacing them with "men" and "women".

      We have updated the text to avoid there pronouns in most places.

      The article in general would benefit from a disclosure statement early on conceding that gender investigated here is only as a binary variable, discounting its spectrum.

      See Line 74-76.

      Please also report how gender balance is defined in the datasets as in the data summary in supplementary materials, within the main text.

      Our definition of gender balance (proportion of researchers who are men, ) is given on Line 103.

      (12) The data summary Table S1 could benefit from explaining the variables in the first column. It is currently unclear how granularity, size of dataset and quotas/pre-allocation? are defined.

      These lines have been removed as they information they contained is included elsewhere in the table with far better explanations!

      (13) There are only 4 data points for investigating covariation between gender balance and funding success in CIHR. This should be discussed as a limitation.

      The small size of the dataset is now mentioned on Line 348.

      (14) L455 "Research varies widely across disciplines" in terms of what?

      This sentence has been extended

      .

      (15) L456 Maybe I am missing something but I don't understand the relevance of "Physicists' search for the grand unified theory" to research quality.

      Removed.

      (16) Can you provide more discussion into the results of your bibliographic analysis and Figure 2? An explanation into the relationships seen in the figure at least would be helpful.

      Thank you we have clarified the relationships seen in each of figures 2A (Lines 226-235), 2B (Lines 236-252), and 2C (lines  260-268).

      (17) It would be helpful to include in the discussion a few more sentences outlining:

      - Potential future research that would help disentangle mechanisms behind the trends you find.

      - How this research could be applied. Should there be some effort to standardise?

      We have added a short paragraph to the discussion about implications/applications, and future research (Lines 481-484).

      (18) The introduction could benefit from discussing and explaining their a priori hypotheses for how research from female-biassed disciplines may be evaluated differently.

      While not discussed in the introduction, possible explanations for why and how research in female dominated fields might be evaluated differently are explored in some detail in the Discussion.  We think once is enough, and towards the end is more effective than at the beginning.

      (19) L16 "Our work builds on others' findings that women's work is valued less, regardless of who performs that work." I find this confusing because in your model, there is a significant interaction effect between gender:pmen. This suggests that for female-biassed disciplines, there is even more of a devaluation for women, which I think your lines in figure 1 suggest.

      Correct but men are still affected, so the sentence is correct.  What is confusing is that the finding is counter to what we might expect.

    1. Author response:

      eLife assessment

      This fundamental study provides a near-comprehensive anatomical description and annotation of neurons in a male Drosophila ventral nerve cord, based on large-scale circuit reconstruction from electron microscopy. This connectome resource will be of substantial interest to neuroscientists interested in sensorimotor control, neural development, and analysis of brain connectivity. However, although the evidence is extensive and compelling, the presentation of results in this very large manuscript lacks clarity and concision.

      We thank the reviewers for their detailed and thoughtful feedback and the time that they invested to provide it. Organising this manuscript (which is clearly not a standard research article) was quite challenging as it had to fulfil a number of functions: presenting a guide to the system of annotations and the associated online resources; providing an atlas for the annotated cell types; and showcasing various analyses to illustrate the value of the dataset as well as just a few of the many questions it can be used to address. We gave careful consideration to its structure and attempted to signpost the sections that would be most useful to particular types of readers. Nevertheless we can see that this was not completely successful and we thank the reviewers for their suggestions for improvement.

      We acknowledge that the resulting manuscript was very large and will endeavour to streamline our text in the revision without compromising the accessibility of the data. We do note that there is some precedent for comprehensive and lengthy connectome papers going all the way back to White et al. 1986 which took 340 pages to describe the 302 neurons of the C. elegans connectome. More recently, we can compare the “hemibrain papers” published in eLife: Scheffer et al., 2020, Li et al., 2020, Schlegel et al., 2021, Hulse et al., 2021. These papers would also be difficult to digest at a single sitting but were game-changing for the Drosophila neuroscience field and have already been cited hundreds of times, a testament to their utility. In the same way that these papers provided the first comprehensively proofread and annotated EM connectome for (a large part of) the adult fly brain, our work now provides the first fully proofread and annotated EM connectome for the nerve cord. Given the pioneering nature of this dataset we feel that the lengthy but highly structured atlas sections of the paper are justified and will prove impactful in the long term.

      Whilst no EM dataset is perfect, we have endeavoured to make this one as comprehensive as possible. We found 74.4 million postsynapses and 15,765 neurons of VNC origin, all of which have been carefully proofread, reviewed, annotated and typed. For comparison, the female adult nerve cord dataset (FANC, Azevedo et al., Nature, 2024) contains roughly 45 million synapses and 14,600 neuronal cell bodies of which at the time of writing 5576 have received preliminary proofreading and 222 high quality proofreading. We emphasise that these are highly complementary datasets, given the difference in sex and the fact that each dataset has different artefacts (MANC has poorer preservation of neurons in the leg nerves; FANC is missing part of the abdominal ganglion and has lower synapse recovery). We reconstructed 5484 sensory neurons from the thoracic nerves, 84% of the ~6500 estimated from FANC. The overall recovery rate was ~86.5% if we include the ~1100 sensory neurons from abdominal nerves, which were in excellent condition.

      Reviewer #1 (Public Review):

      Summary:

      The authors present a close to complete annotation of the male Drosophila ventral nerve cord, a critical part of the fly's central nervous system.

      Strengths:

      The manuscript describes an enormous amount of work that takes the first steps towards presenting and comprehending the complexity and organization of the ventral nerve cord. The analysis is thorough and complete. It also makes the effort to connect this EM-centric view of the nervous system to more classical analyses, such as the previously defined hemilineages, that also describe the organization of the fly nervous system. There are many, many insights that come from this work that will be valuable to the field for the foreseeable future.

      We thank the reviewer for acknowledging the enormous collaborative effort represented by this manuscript. We tried to synthesise decades of light-level work by neuroscientists and developmental biologists working in Drosophila and other insects in order to create a standard, systematic nomenclature for >22,000 neurons, most of which had not been typed at light level. We hope that the MANC dataset and this guide to its contents will prove to be useful resources to Drosophila neurobiologists and the wider neuroscience field.

      Weaknesses:

      With more than 60 primary figures, the paper is overwhelming and cannot be read and digested in a single sitting. The result is more like a detailed resource rather than a typical research paper.

      In writing this paper, we had two aims: first, to describe and validate our extensive biological annotation of the connectome and second, to provide interesting illustrative examples of the many analyses that could be carried out on this dataset using the atlas we generated. The resulting paper is intended primarily as a detailed reference rather than a typical research paper. At the end of the Introduction, we outline the structure of the paper and explicitly direct non-specialist readers to focus on the initial and concluding sections for orientation to the dataset so that they would not get bogged down in the details. We will review our section organisation and headings to try to make the paper more straightforward to navigate, and we will add specific figure numbers to the outline.

      Reviewer #2 (Public Review):

      Summary and strengths:

      This massive paper describes the identity and connectivity of neurons reconstructed from a volumetric EM image volume of the ventral nerve cord (VNC) of a male fruit fly. The segmentation of the EM data was described in one companion paper; the classification of the neurons entering the VNC from the brain (descending neurons or DNs) and the motor neurons leaving the VNC was described in a second companion paper. Here, the authors describe a system for annotating the remaining neurons in the VNC, which include intrinsic neurons, ascending neurons, and sensory neurons, representing the vast majority of neurons in the dataset. Another fundamental contribution of this paper is the identification of the developmental origins (hemilineage) of each intrinsic neuron in the VNC. These comprehensive hemilineage annotations can be used to understand the relationship between development and circuit structure, provide insight into neurotransmitter identity, and facilitate comparisons across insect species.Many sensory neurons are also annotated by comparison to past literature. Overall, defining and applying this annotation system provides the field with a standard nomenclature and resource for future studies of VNC anatomy, connectivity, and development. This is a monumental effort that will fundamentally transform the field of Drosophila neuroscience and provide a roadmap for similar connectomic studies in other organisms.

      We thank the reviewer for acknowledging the enormous collaborative effort represented by this manuscript. We tried to synthesise decades of light-level work by neuroscientists and developmental biologists working in Drosophila and other insects in order to create a standard, systematic nomenclature for >22,000 neurons, most of which had not been typed at light level. We hope that the MANC dataset and this guide to its contents will prove to be useful resources to Drosophila neurobiologists and the wider neuroscience field.

      Weaknesses:

      Despite the significant merit of these contributions, the manuscript is challenging to read and comprehend. In some places, it seems to be attempting to comprehensively document everything the authors found in this immense dataset. In other places, there are gaps in scholarship and analysis. As it is currently constructed, I worry that the manuscript will intimidate general readers looking for an entry point to the system, and ostracize specialized readers who are unable to use the paper as a comprehensive reference due to its confusing organization.

      In writing this paper, we had two aims: first, to describe and validate our extensive biological annotation of the connectome and second, to provide interesting illustrative examples of the many analyses that could be carried out on this dataset using the atlas we generated. The resulting paper is intended primarily as a detailed reference rather than a typical research paper. At the end of the Introduction, we outline the structure of the paper and explicitly direct non-specialist readers to focus on the initial and concluding sections for orientation to the dataset so that they would not get bogged down in the details. We will review our section organisation and headings to try to make the paper more straightforward to navigate, and we will add specific figure numbers to the outline.

      The bulk of the 559 pages of the submitted paper is taken up by a set of dashboard figures for each of ~40 hemilineages. Formatting the paper as an eLife publication will certainly help condense these supplemental figures into a more manageable format, but 68 primary figures will remain, and many of these also lack quality and clarity. Without articulating a clear function for each plot, it is hard to know what the authors missed or chose not to show. As an example, many of the axis labels indicate the hemilineage of a group of neurons, but are ordered haphazardly and so small as to be illegible; if the hemilineage name is too small, and in a bespoke order for that data, then is the reader meant to ignore the specific hemilineage labels?

      We will contact eLife professional editing staff to determine whether the paper can be streamlined by moving more material to supplemental without making it difficult to locate the detailed catalogues of neurons that will be of interest to specialist readers. Based on the typical eLife format, we suspect that retaining the dashboard main figures for each hemilineage will be necessary to maintain its utility as a reference. We will, however, shorten the associated main text by, for example, moving background material used to assign the hemilineages to the Methods section and moving specific results to the figure legends where possible.

      We articulated the function for each plot as follows: "Below we describe in more depth every hemilineage that produces more than one or two secondary neurons. For each of these 35 hemilineages, we show (A) the overall morphology of the secondary population, (B) representative individual neurons (as estimated by highest average NBLAST score to other members of the hemilineage), and (C) specific notable examples (which in some cases are primary). We then report (D) the locations of their connectors (postsynapses and presynapses), (E) their upstream and downstream partners by class, and (F) their upstream and downstream partners by finer subdivisions corresponding to their systematic types (secondary hemilineage, target, or sensory modality). We also provide supplementary figures showing the morphology and normalised up- and downstream connectivity of all systematic types for each hemilineage."

      We have plotted every secondary neuron in each hemilineage, every predicted synapse for those neurons with confidence >0.5, every connection to partner neurons by class (no threshold applied), and then the same information organised by hemilineage in a heatmap (and including partners from all birthtimes and partners of unknown hemilineage). Then the supplementary figures show all connectivity, organised in the same way, for every individual cell type assigned to the hemilineage, including both primary and early secondary neurons. We will add more detail to the figure legends to clarify these points.

      We apologise that you were unable to read some of the axis labels in the review copy of the manuscript; we did submit high resolution versions of the figures as a supplemental file, but perhaps this did not reach you; they can also be found at https://www.biorxiv.org/content/10.1101/2023.06.05.543407v2.supplementary-material. The hemilineages are in a conserved (alphanumerical) order for all hemilineage-specific plots and many others. The exceptions arise when neurons are clustered based on their connectivity to hemilineages, in which case the order of the labels necessarily follows the structure of the resulting clusters.

      The text has similar problems of emphasis. It is often meandering and repetitive. Overlapping information is found in multiple places, which causes the paper to be much longer than it needs to be. For example, the concept of hemilineages is introduced three times before the subtitle "Introduction to hemilineage-based organisation". When cell typing is introduced, it is unclear how this relates to serial motif, hemilineage, etc; "Secondary hemilineages" follow the Cell typing title. Like the overwhelming number of graphical elements, this gives the impression that little attention has been paid to curating and editing the text. It is unclear whether the authors intend for the paper to be read linearly or used as a reference. In addition, descriptions of the naming system are often followed by extensive caveats and exceptions, giving the impression that the system is not airtight and possibly fluid. At many points, the text vacillates between careful consideration of the dataset's limitations and overly grandiose claims. These presentation flaws overshadow the paper's fundamental contribution of describing a reasonable and useful cell-typing system and placing intrinsic neurons within this framework.

      Because we intended this paper to be read primarily as a reference, we tried to make each section stand on its own, which we agree resulted in some redundancy (with more details appearing where relevant). However, we will do our best to tighten the text for the version of record.

      Our description immediately under the Cell typing title includes the use of hemilineage, serial (not serial motif, which was not used), and laterality (left-right homologues) in the procedure to assign cell types. We will change this to “Cell typing of intrinsic, ascending, and efferent neurons” for clarity. The “Secondary hemilineages” title marks the start of a new section that serves as a reference for each of the secondary hemilineages; we will change this to “Secondary hemilineage catalogue” or similar for clarity.

      References to past Drosophila literature are inconsistent and references to work from other insects are generally not included; for example, the extensive past work on leg sensory neurons in locusts, cockroaches, and stick insects. Such omissions are understandable in a situation where brevity is paramount. However, this paper adopts a comprehensive and authoritative tone that gives the reader an impression of completeness that does not hold up under careful scrutiny.

      We did not attempt to review the sensory neuron literature in this manuscript but rather cited those specific papers which included the axon morphology data that informed our modality, peripheral origin, and cell type assignments. Most of these came from the Drosophila literature due to the availability of genetic tools used for sparse labelling of specific populations as well as the greatly increased likelihood of conserved morphology. However we certainly agree that decades of sensory neuron work in larger insects were foundational for this subfield and will add a sentence to this effect in the introduction to our sensory neuron typing.

      The paper accompanies the release of the MANC dataset (EM images, segmentation, annotations) through a web browser-based tool: clio.janelia.org. The paper would be improved by distilling it down to its core elements, and then encouraging readers to explore the dataset through this interactive interface. Streamlining the paper by removing extraneous and incomplete analyses would provide the reader with a conceptual or practical framework on which to base their own queries of the connectome.

      We certainly hope that this paper will encourage readers to explore the MANC dataset. Indeed, as we state in the Discussion, "Moreover, its ultimate utility depends on how widely it is leveraged in the future experimental and computational work of the entire neuroscience community. We have only revealed the tip of the iceberg in this report, with a wealth of opportunities now available in this publicly available dataset for forthcoming connectomic analyses that will feed into testable functional hypotheses." In the first few sections of the Results, we include a visual introduction to annotated features, a glossary of annotation terms, a visual guide to our cell typing nomenclature, and two video tutorials on the use of Clio Neuroglancer to query the dataset. To further encourage exploration, we have also included illustrative examples of just a few of the many analyses that can now be performed with this comprehensive and publicly available dataset.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Q1: First of all, the term organoid must be discarded. The authors just seed the endometrial cell mixture which assembles and aggregates into a 3D structure which is then immediately used for analysis. Organoids grow from tissue stem cells and must be passage-able (see their own description in lines 69-71). So, the term organoid must be removed everywhere, to not confuse the organoid field. It is not shown that the whole 3D assembly is passageable, which would be very surprising given the fact that immune and stromal cells do not grow in Matrigel because of the unfavorable growing conditions (which are targeted to epithelial cell growth).

      We appreciate for your highlighting concerns regarding our organoid construction.

      (1) The organoids in our system were originated from tissue stem cells.

      We induced adult stem cells derived from endometrial tissue to construct organoids in vitro by various small molecules (such as Noggin, EGF, FGF2, WNT-3A and R-Spondin1), which involves a complex self-assembly process rather than a mere cellular assembly. Initially, there are single cells and small cell clusters in the system two days after the planting. On the fourth day, the glandular epithelial cells gradually assembled to glands, while the stromal cells spontaneously organized themselves around the glands.  On the eleventh day, the endometrial glands enlarged, epithelial cells organized in a paving stone arrangement, and stromal cells established an extensive network. (Author response image1) (Figure 1C)

      (2) The organoids we constructed are passage-able.  

      Most organoids were used for experiments up to the fifth generation, while some are extended to the 10th generation and cryopreserved. (Response Figure 1B, C)

      (3) Immune and stromal cells are present in our system from the primary to the fourth generation. In our study, immune and stromal cells were identified not only from scRNA-seq data (third generation of organoids) (Figure 2A), but also from the morphology using 3D transparent staining and light sheet microscopy imaging (third generation of organoids), with Vimentin marking stromal cells, CD45 designating immune cells, and FOXA2 identifying glands. Further, flow cytometric analysis was applied to verify immune cells within the organoids (third generation of organoids). (Response Figure 1D, E, F)  

      Moreover, Immune cells and stromal cells can grow in Matrigel, which was also found in the study of organoid pioneer Hans Clevers (Hans Clevers et al., Nature Reviews Immunology 2019).

      Author response image 1.

      (A) The growth condition of endometrial cells was observed from day2 to day11 after plating under an inverted microscope. Scale bar = 200 μm. (B) The endometrial organoids of different passages were observed from P1 to P5. Scale bar = 200 μm. (C) Stromal cells formed an extensive network (down). The arrowhead indicates dendritic stromal cells. Scale bar = 100 μm (left), Scale bar = 50 μm (right). (D) Exhibition of stromal cells marked by vimentin. Nuclei were counterstained with DAPI. The arrow indicates stromal cells. Scale bar = 40 μm (up), Scale bar = 30 μm (down). (E)Exhibition of immune cells marked by CD45 and endometrial gland marked by FOXA2. Nuclei were counterstained with DAPI. The arrow indicates immune cells. Scale bar = 50 μm. (F) Flow cytometric analysis of T cells and macrophages in the endometrial organoid. Gating strategy used for determining white blood cells (CD45+ cells), T cells (CD45+CD3+ cells) and macrophages (CD45+CD68+CD11b+ cells).

      Q2: Second, the study remains fully descriptive, bombing the reader with a mass of bioinformatic analyses without clear descriptions and take-home messages. The paper is very dense, meaning readers may give up. Moreover, functional validation, except for morphological and immunostaining analyses (which are posed as "functional" but actually are only again expression) is missing, such as in vivo functionality (after transplantation e.g.) and embryo interaction. Importantly, the 3D structure misses the right architecture with a lining luminal epithelium which is present in the receptive endometrium in vivo and needed as the first contact site with the embryo. So, in contrast to what the authors claim, this is not the best model to study embryo interaction, or the closest model to the in vivo state (line 318, line 326).

      Thank you.

      (1) We have made the following improvements. Firstly, we have conducted additional experiments to validate the bioinformatics analysis. Secondly, the structure of the manuscript has been refined to ensure logical coherence and clear transitions between paragraphs. Thirdly, important findings have been emphasized to ensure readers’ comprehension and inspiration. Furthermore, the manuscript was revised by both domestic and international experts to enhance the readability and clarity.

      (2)  For the functional validation, in vivo transfer could not be carried out so far due to ethical limitation. But human embryos are able to develop and grow more efficiently in combining with the receptive endometrial organoids we generated (unpublished data).

      (3) As you suggested, we replaced the “closest” with “closer”. It is undeniable that the model cannot completely simulate the in vivo implantation process that the luminal epithelium of the endometrium contacts the embryo first.  

      Q3: Third, receptive endometrial organoids (assembloids; Rawlings et al., eLife 2021) and receptive organoid-derived "open-faced endometrial layer" (Kagawa et al., Nature 2022) have already been described, which is in contrast to what the authors claim in several places that "they are the first" (e.g. lines 87-88, 316-319, etc). These studies used real organoids to achieve their model (and even showed embryo interaction), while in the present study, different cell types are just seeded and assembled. Hence, logically, immune cells are present which are never found in real organoid models. The only original aspect in the present study is the use of hormones to enhance the WOI phenotype. However, crucial information on this original aspect is missing such as concentration of the hormones, refreshment schedule, all 3 hormones added together or separately, and all 3 required?

      Thank you for pointing out these researches referring to endometrial organoids.

      (1) While we didn’t explicitly state "the first", we should be careful to use the expressions similar to "the first". It has been changed to a gentle and modest expression, as follows “we are far from understanding how embryo implantation occurs during the WOI due to ethical limitations and fewer in vitro receptive endometrial model” and “which confirms that they are closer to the in vivo state”.

      (2) The definition of organoids and the existence of immune cells have been detailed addressed in the first question.

      (3) In terms of hormone scheme, hormone concentrations have been detailed in Table S2 of Supplementary. Estrogen was supplemented to the basal medium for the initial two days, after which a combination treatment of MPA, cAMP, PRL, hPL, and HCG was administered for the subsequent six days. The medium was refreshed every two days.

      All three hormones were deemed necessary, which was validated by multiple group comparisons. Only the organoids treated with all six hormones together exhibited an endometrial receptivityrelated gene expression profile. (Author response image 2).

      Author response image 2.

      Heatmap showing receptivity related gene expression profile of organoids in each hormone regimen.  

      Q4: Moreover, it is not a "robust" model at all as the authors claim, given the variability of the initial cell mixture (varying from patient to patient). Actually, the reproducibility is not shown. The proportions of the different cell types seeded in the Matrigel droplet will be different with every endometrial biopsy. It would be much better to recombine epithelial (passageable) organoids with stromal and immune cells in a quantified, standardized manner to establish a "robust" model.

      Thanks for your suggestion.  

      Firstly, the constructed endometrial organoids generally consist of epithelial, stromal, and immune cells. However, it is undeniable that the cell proportions may vary slightly among different patients. Secondly, the term "robust" is intended to convey strong support for embryo development, which will be supported by our next study (unpublished data). Therefore, robust is replaced here as alternative. Thirdly, as for "reproducibility", the hormone-treated organoids from different women exhibited similarity to the in vivo receptive endometrium through multi-omics analysis, ERT, and various other experiments.  

      Reviewer #2 (Public Review):

      Q1: With endometrial receptivity analysis, they suggest a successful formation of the implantation window in vitro, but this result is difficult to interpret.

      Thanks for your question.  

      We understand that the most effective way to demonstrate endometrial receptivity is embryo implantation, which was conducted simultaneously and will be presented in our next study. In this study, we validated the receptivity based on the current researches.

      (1) At the single-cell transcriptome level, the cellular composition and function of the receptive endometrial organoids were similar to those of the in vivo implantation window (Stephen R. Quake et al, 2020).

      (2) At the whole organoids level, the receptive endometrial organoids exhibited the similar characteristics in transcriptome and proteome to the in vivo mid-secretory endometrium (Andres Salumets 2017, Qi Yu 2018, Triin Laisk 2018, Edson Guimarães Lo Turco 2018, Xiaoyan Chen 2020, Francisco Domínguez 2020, DavidW. Greening 2021, Norihiro Sugino 2023). The receptive endometrial organoids were also validated by endometrial receptivity test (ERT), which utilized high-throughput sequencing and machine learning to assess endometrial receptivity (Yanping Li et al., 2021).  

      (3) At the microstructural level under electron microscope, the receptive endometrial organoids exhibited characteristics of the implantation window, such as pinopodes, glycogen particles, microvilli, and cilia.

      Overall, the receptive organoids we constructed closely resemble the in vivo implantation window at the single-cell, organoids, and microstructural levels based on existing researches.

      Q2: Analyzing transcriptome and proteome information of WOI organoids, authors demonstrate a strong response to estrogen and progesterone, but some comparisons are made with CTRL and SEC, and others only with CTRL, which limits the power of some results. In the same way, some genes related to Cilia and pinopodes appear dominant in WOI organoids, but the comparison by electron microscopy is made only against CTRL organoids.  

      In subsequent analysis, WOI organoids showed a marked differentiation from proliferative to secretory epithelium, and from proliferative epithelium to EMT-derived stromal cells than SEC organoids. These statements are based on their upregulation of monocarboxylic acid and lipid metabolism, their enhanced peptide metabolism and mitochondrial energy metabolism, or their pseudotime trajectories. However, other analyses (such as the accumulation of secretory epithelium or decreased proliferative epithelium, the increased ciliated epithelium after hormonal treatment, or the presence of EMT-derived stromal cells) show only small differences between SEC and WOI organoids.

      Thank you for raising these important questions.

      (1) At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), and both are similar at the overall organoid level.  

      (2) At the single-cell level, the accumulation of secretory epithelium, decreased proliferative epithelium, increased ciliated epithelium post hormonal treatment, or the presence of EMTderived stromal cells are the fundamental features of the secretory endometrium. Therefore, these features are present in both WOI and SEC organoids. However, the most notable differences lie in the more comprehensive differentiation and varied cellular functions exhibited by WOI organoids compared to SEC organoids.

      (3) Regarding electron microscopy, we have now quantitatively compared the presence of various characteristic structures such as microvilli, cilia, pinopodes and glycogen in the CTRL, SEC and WOI groups. It has been observed that WOI organoids possess longer microvilli and increased cilia, glycogen, and pinopodes compared to SEC organoids (Fig2H).

      Reviewer #1 (Recommendations For The Authors):

      Q1: Several of the key methods are performed by companies, hence not in detail described and therefore not verifiable which is essential for reviewers and readers.

      We are grateful for the suggestion. Specific methods have now been incorporated into the "Supporting Information" section. (Line91~102, Line 107~123, Line 132~139)

      Q2 - Line 49: It is not shown in the present study whether the WOI organoids are a 'robust' platform.

      - Line 76: There is a study (Dolat L., Valdivia RH., Journal of Cell Science, 2021) that developed a co-culture with endometrial organoids and immune cells (neutrophils) which should be mentioned.:

      We have reweighed the word and now replace 'robust' with 'alternative' (Line 54).  We have considered the reviewer's suggestion and added this citation (Line 82-83) about the cocultivation of immune cells with endothelial organoids, which was not previously cited mainly because the research model was mouse.

      Q3: Figure 1: Endometrial organoids possess endometrial morphology and function. - The authors should further explain their decision to add PRL, hCG, and hPL to the organoid culture. Why these particular compounds? What is their specific role during the WOI?

      In terms of hormone scheme, estrogen and progesterone promote the transition of endometrial organoids into the secretory phase, and on this basis, pregnancy hormones can further promote their differentiation. PRL promotes immune regulation and angiogenesis during implantation, HCG improves endometrial thickness and receptivity, and HPL promotes the development and function of endometrial glands. Our constructed WOI organoid is in a state conducive to embryo implantation. We aim to develop an in vitro model for embryo implantation study. The detailed explanation of this aspect was initially provided in the Discussion section (Lines 298–313). To enhance the clarity for reviewers and readers regarding the selection of the hormonal regimen, we have now articulated it in the Results section (Lines 124–130).

      When selecting hormone formulations, multiple group comparisons were made. It was found that the number, area, and average intensity of organoids in these groups were similar over time. But the WOI organoids showed endometrial receptivity related gene expression profile, which highly expressed genes positively correlated with endometrial receptivity, and lowly expressed genes negatively correlated with receptivity, compared to the other hormone formulations (added to Figure S1E, S1F). Hormone dosage was primarily based on peri-pregnant maternal body or localized endometrium levels (Margherita Y. Turco et al., Nature Cell Biology 2017).

      -  Line 108: "the endometrial cells" instead of "endometrial organoid"? Because the authors also refer to the stromal cells.

      You should be referring to this sentence “The endometrial organoid, consisting of vesicle-like glands, fibrous stromal cells, and other surrounding cells, developed into a 3D structure with the support of Matrigel”. Organoid, a self-assembled 3D structure, consists of multiple cells and closely resembles in vivo tissue or organ. It offers high expansibility, phenotypic, and functional properties. Here, we aim to delineate the endometrial organoid, comprising epithelial cells, stromal cells, and other cellular components that assemble to form intricate 3D structures. Hence, the term "endometrial organoid" is more appropriate.

      -  Line 110: "the endometrial glands", do the authors mean the endometrial organoids? The authors also mention they enlarge, which must be quantified.

      You should be referring to this sentence “As the organoids grew and differentiated, the endometrial glands enlarged, epithelial cells adopted a paving stone arrangement, and stromal cells formed an extensive network”. Here, we mean the “endometrial glands” grow progressively in the organoids. We agree with your suggestion to quantify the change of organoids’ area over time, and found that they increased progressively in all three groups (shown as follows) (Fig.S1E) (Line130-131) 

      Author response image 3.

      The dynamic changes of the area of organoids over time in the CTRL, SEC and WOI organoids.

      -  Line 112: E-cadherin is a general epithelial marker, not a glandular marker.

      We agree with your suggestion and now change to ‘The epithelium marker E-cadherin’ (Line110).

      -  Line 116: Which group was used for KI67 and CC3 staining?

      The CTRL organoids were used for Ki67 and CC3 staining. We have modified this expression in the Figure 1E Legend.

      -  Line 123: Organoid size (diameter or area) needs to be quantified to claim that WOI organoids grow slower than SEC/CTRL organoids. The same goes for Ki67+ cells for proliferation. In the legend of Fig 1B, the authors in contrast state that the organoids show a similar growth pattern.

      We are extremely grateful to you for pointing out this problem. We quantitatively analyzed the size of organoids in the three groups. The area was found to be increasing over time, with the three groups growing the most vigorously in the CTRL group, followed by the SEC group and the WOI group, but the differences were not statistically significant. Relevant results have been added to Figure S1E (Line130-131). There were no significant differences in Ki67 expression of these organoids. Therefore, the three groups of organoids showed a similar growth pattern. We decided to delete the statement “Following hormonal stimulation, WOI organoids exhibited slower growth than SEC and CTRL organoids, while CTRL organoids maintained robust proliferative activity (Fig. 1B)”.

      Author response image 4.

      The dynamic changes of the area of organoids over time in the CTRL, SEC and WOI organoids.

      -  Line 126: Fourteen days of organoid treatment is a very long time. Growing organoids may already be dying which should be checked by CC3 staining to prove that organoids are still fully viable.

      Endometrial organoids are vigorous in proliferation and have a long survival period due to the presence of adult stem cells. To address your queries effectively, we conducted CC3 staining on the organoids treated for 14 days, revealing negligible expression levels (shown as below).

      Author response image 5.

      Figure note: The Ki67 and CC3 immunostaining on the organoids after 14-day hormone treatment.

      -  Line 128: Changes in hormone receptors should be supported by RT-qPCR data to be more convincing

      We agree with your suggestion. Here we supplemented the RT-PCR results of hormone receptors as follows (Figure S1D) (Line119-121). PAEP and PGR are associated with progesterone, and OLFM4 and EGR1 are associated with estrogen.

      -  1A: Are authors able to see and characterize decidualized stromal cells as indicated in the illustration?

      Upon the reviewer's inquiry, we carefully observed the morphology of stromal cells in hormone-treated organoids. Regrettably, the morphology of decidualized stromal cells was not ascertainable through light microscopy in our endometrial organoids.

      -  1C: Which treatment condition are the organoids in these images?

      This figure showed the bright-field morphology of the CTRL organoids, which is now noted in the Figure 1C legend.

      -  1D: PAS staining should be quantified to support the claims.

      We agree with your suggestion. The quantitative comparison of PAS staining was conducted in these three groups of organoids (Figure S1G) (Line142-143)

      -  1D: Where are the stromal cells in the model? There should be vimentin-positive cells outside of the glands.

      The figure 1D illustrates the outcomes of section staining, which owned limitation to displaying stromal cells around the gland. Considering the 3D structure of organoids, we conducted organoid clearing and staining, and observed stromal cells (marked by Vimentin) under light sheet microscope (shown as below). The stromal cells were also presented using this method in the original Figure 2B.

      Author response image 6.

      Exhibition of stromal cell marked by vimentin of CTRL organoid through whole-mount clearing, immunostaining and light sheet microscopy imaging. Nuclei were counterstained with DAPI. The arrowhead indicates stromal cells. Scale bar = 70 μm.

      Figure 2: Developing receptive endometrial organoids in vitro mimicking the implantation window endometrium.

      -  Line 142: CD44 is not an exclusive marker for immune cells. It has been shown to be expressed in glandular secretory epithelial cells (Fonseca et al., 2023). The authors also mention that CD44 is expressed in stromal cells (line 265). Staining for CD45 (or another immune-specific marker) is needed to demonstrate the presence of immune cells. 

      We appreciated your suggestions. We demonstrated the distribution of immune cells in organoids using the organoid clearing technique in combination with light-sheet microscopy imaging, using CD45 as a marker (Figure 2C).

      -  Line 144: What are the proportions of the immune cells? What is the variation between patient samples?

      We assessed the proportion of immune cells with the help of flow cytometry and analyzed the proportion of Macrophages and T cells in organoids derived from 8 patients. The proportion of WBC in organoids was about 3%~4% (Figure 2D), among which macrophages were less than 1% and T cells less than 2% (Figure S2E). There existed a very few patients with large heterogeneity, and the proportion of immune cells in most patients was

      relatively stable.

      -  Line 161: What is the endometrial receptivity test (ERT)? Not explained at all.

      Endometrial Receptivity Test (ERT) is a kind of gene analysis-based method for detecting endometrial receptivity, which combines high-throughput sequencing and machine learning to analyze the expression of endometrial receptivity-related genes, allowing for a relatively accurate assessment of endometrial receptivity. It is currently used in clinical practice to determine endometrial receptivity and guide personalized embryo transfer (Yanping Li et al., J Transl Med 2021). (line179-183)

      -  2A: The authors' dataset is compared to a published dataset. How were they combined? Were they merged, mapped on each other, or integrated? Were all cells employed from the published dataset or specific cell types? Much detail to evaluate the analysis is missing.

      We are very grateful for your comments.  

      (1) The four raw datasets (CTRL, SEC and WOI organoids, and mid-secretory endometrium) underwent batch correction and integration using Harmony. Subsequently, the integrated dataset underwent dimensionality reduction via  PCA. The soft k-means clustering algorithm was employed to address batch effects and clustering, utilizing a clustering parameter resolution of 0.5. Finally, the clustering results were visualized using tSNE based on the cell subpopulation classification. (“Methods” Line164-175)

      (2) The Figure 2A displayed comparison of glandular and luminal epithelium, secretory epithelium, LGR5 epithelium, EMT-derived stromal cells, ciliated epithelium, and glandular secretory epithelium (shown as Figure S2C~S2D) (Line150-154)

      - 2E: Please add the cell type names above the heatmaps to improve readability.

      Thanks to your suggestion, we have added the cell type names above the heatmaps.

      - 2G: The difference between the left and right graphs is not clear from the figure itself. Improve by adding a title and more explanation.

      Thanks for your careful review. We have added the title to the left and right graphs.

      Supplementary Figure 3 is referenced with Figure 2. Supplementary Figure 2 is referenced with Figure 3. The order needs to be changed.

      Thanks for your careful review. We have changed the order.

      - S3B: Typical markers for annotation of the different cell clusters are not included and therefore it is not convincing enough that annotations are correct. E.g. Epithelial markers (EPCAM, CDH1), Stromal cells (VIM, PDGFRA), SOX9+LGR5+ cells (SOX9, LGR5). How were the EMT-derived stromal cells designated? It is not clear from the data whether they are in fact EMT-derived or whether they show epithelial markers as well (stated in line 246).

      We deeply appreciate your suggestion. We provided more details to describe the cell clustering as the following. Single-cell transcriptomics analysis referred to CellMarker, PanglaoDB, Human Cell Atlas, Human Cell Landscape, and scRNASeqDB, and previous endometrium related studies. (W. Wang et al., Nat Med 2020, P. D. Harriet C. Fitzgerald et al., PNAS 2019, K. M. Thomas, M Rawlings et al., eLife 2021, L. Garcia-Alonso et al., Nat Genet 2021) 

      (1) SOX9+LGR5+ cells: SOX9 and LGR5 are both proliferative markers. SOX9 is expressed in all clusters dispersedly. LGR5 is mainly expressed in two clusters, one of which is stem derived epithelium, and the other cluster expresses LGR5 in a scattered manner. Refer to the markers of SOX9+LGR5+ cells, SOX9+LGR5- cells, and SOX9+ proliferative cells in 2021 Nature Genetics (L. Garcia-Alonso et al., Nat Genet 2021), the cells in this cluster expressed high levels of NUAK2, CNKSR3, FOS and LIF, which was consistent with the expression profiles of SOX9+LGR5+ cells and SOX9+ proliferative cells. However, considering that the number of cells expressing LGR5 was relatively small, this cluster of cells was renamed SOX9+ proliferative epithelium.

      Figure 3: Receptive endometrial organoids recapitulate WOI-associated biological characteristics. - Line 173-174: The WOI organoids should be compared in detail to the SEC organoids in addition to the CTRL organoids, to show that this WOI model and new hormonal treatment is providing better results compared to the SEC organoids and the results obtained in previous studies.

      Thanks for your suggestion. At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), which prompted us to continue exploring at the single-cell level.

      - Line 190: Quantification of pinopodes is required to claim that they are more densely arranged in WOI organoids. 

      - Line 190-191: Again, is there a difference in pinopode presence between the WOI and SEC organoids to show that the WOI organoids are really distinct and a better model?

      We agree with the reviewer’s suggestion and quantified the pinopodes. The CTRL, SEC and WOI organoids were found to have increasing numbers of pinopodes, with WOI organoid owning the most abundant pinopodes under electron microscope. (Figure 2H) (Line184-186)

      - Line 194: Also here, quantification of the glycogen particles is missing.

      We agree with your suggestion. We have quantified the area of glycogen particles under electron microscope in the CTRL, SEC and WOI organoids. It was found that WOI organoid had the most glycogen particles. (Figure 2H) (Line184-186)

      - 3C: There is no difference between SEC and WOI organoids condition for OLFM4 and PRA/B. What is the purpose then of adding extra hormones if no difference is present?

      The figure 3C indicated that there was no significant difference in OLFM4 and PRA/B level (reflecting estrogen and progesterone responsiveness) in SEC and WOI organoids at the organoids level. It is understandable because WOI organoids are induced further into the implantation window on the basis of the secretory phase (i.e., SEC organoids), and both are similar at the overall level of organoids. Based on this, we further explored the differences between WOI organoids and SEC organoids at the single-cell level.

      - 3G: A higher magnification is necessary to evaluate cilia staining. From these images, it seems like CTRL organoids also express acetyl-a-tubulin.

      Thanks for your suggestion. The figure has been enlarged and shown as below. The acetyl-a-tubulin of WOI organoids is different from that of CTRL organoids in morphology and expression level. The glands of WOI organoids have small green tips (expressing acetyl-α-tubulin) convex toward the lumen. WOI organoids expressed higher level of acetyl-α-tubulin than CTRL organoids. (Now replaced with Figure 3G in the revised draft).

      Figure 4: Structural cells construct WOI with functionally dynamic changes

      - Line 211: To which figure are these claims referring to?

      You should be referring to this sentence “In terms of energy metabolism, the WOI organoids exhibited upregulation of monocarboxylic acid and lipid metabolism, and hypoxia response”. Up-regulation of monocarboxylic acid and lipid metabolism in WOI organoids is reflected in Figure 3B, and up-regulation of hypoxia responses is reflected in Figure S3F.

      - In general, it should be stated in the text that CellPhoneDB is a useful tool to investigate ligandreceptor interactions, however, it only proposes potential interactions. To validate such interactions, stainings and functional assays are required.

      Thanks for your suggestion. The CellphoneDB was briefly introduced in the "Methods" section of "Supporting information" originally. Now it has been explained in the line 256-257 of main text.

      We agree that staining and functional assays are required to validate the ligand-receptor interactions. Therefore, we used the proximity ligation assay (PLA) to verify the trend of interaction. (Figure S2J, Line259-261, Line 277-279, Line 285-288)

      - Line 243: Please describe the process of EMT in the endometrium more specifically.

      EMT is a common and crucial biological event in the endometrium during the implantation window. During the EMT process, epithelial cells lose their epithelial characteristics while gaining migratory and invasive properties of fibroblasts.

      During the attachment and adhesion phases of embryo implantation, interaction mediated by trophoblastic factors (e.g. integrins) and maternal ECM factors (e.g. fibronectin) induce the eventual EMT in the trophectoderm. During the peri-implantation period, microRNAs, (e.g. miR429 and miR-126a-3p) which regulate EMT, are expressed in the maternal luminal epithelium to different degrees, mediating its transformation process as the blastocyst invades the maternal decidua. The epithelium of endometrium transforms to epithelioid stromal cells with increased migratory and invasive capacities through the EMT process. The decidual stromal cells migrate away from the implantation site, having acquired increased motility. (Line 265-267)

      - Lines 247-251 and 313-316: the claim that proliferative epithelium transforms into EMT derived stromal cells by pseudotime trajectory is too bold and must be underpinned by other means. Pseudotime analysis only suggests and is by definition biased since the first/originating population must be defined by the operator.

      In addition to pseudotime analysis based on monocle, RNA rate analysis based on scVelo is also used for cell evolution analysis. They can prove each other if both analyses indicate the transformation from proliferative epithelium to EMT-derived stromal cell. RNA rate analysis automatically determines the direction of differentiation, which can be used as evidence to determine the starting point of pseudotime analysis.

      RNA rate analysis showed that the EMT derived stromal cell was most closely connected to the proliferative epithelium. Besides, the pseudotime point plot inferred that the proliferative epithelium was the root cell. It can be mutually proved with pseudotime analysis that the transformation from proliferative epithelium to EMT-derived stromal cell.

      Author response image 7.

      RNA rate junction diagram (To infer intercellular connectivity)

      Author response image 8.

      Time differentiation of cells

      Discussion

      - Line 300-302: It would be interesting to investigate ATP production and IL8 release in the WOI organoids to validate with findings from in vivo.

      To answer this point of your interest, we purposely examined ATP production and IL8 release. It was found that WOI organoids indeed produced much more ATP and IL8 than CTRL and

      SEC organoids (Figure S3L) (Line323-324)

      - Line 313-316: Do the WOI organoids lose polarity and cell-to-cell junctions?

      Transcriptome sequencing revealed downregulation of cell adhesion and RHO GTPase signaling in WOI organoids (Figure 3B). Electron microscopy revealed that the cellular arrangement of WOI organoids was slightly looser than that of CTRL organoids, but the microvilli were still oriented toward the medial side of the glands and did not undergo polarity reversal (shown as below).

      Author response image 9.

      Electron micrograph of the CTRL (left), and WOI (right) endometrial organoid. Scale bar = 5 μm.  

      - Line 322: Where is the data that shows that 'a decreased abundance of immune cells', is observed?  

      A decreased abundance of immune cells was observed through single-cell transcriptome sequencing and flow cytometry. The number of immune cells was reduced in WOI organoids compared to CTRL organoids in single-cell sequencing results (Figure 4A). Besides, flow cytometry also showed that the percentage of WBCs in WOI organoids was lower than that in CTRL organoids (Figure S2F).  

      - Line 324: Elaborate more on how the immune cell composition differs from the endometrium.

      The differences of immune cell composition between organoids and endometrium were mainly reflected in the proportion of WBC, the proportion of immune cell subtypes and the changes of T cells after entering the implantation window.

      Firstly, the proportion of WBCs in organoids was lower than that in endometrium. Flow cytometry showed that the proportion of WBC in organoids was about 3%~4% (Figure 2D), but the proportion of WBCs in endometrium was about 8% (W. Wang et al., Nat Med 2020). Secondly, the proportions of T cells and macrophages in organoids were about 2%~3% and 1% (Figure 2D), respectively, but the proportions of lymphocytes and macrophages in endometrium were 7%~8% and 0.6%~0.7% (W. Wang et al., Nat Med 2020). Besides, after entering the implantation window, T cells in WOI organoids decreased (Figure S2F), while T cells in endometrium increased (W. Wang et al., Nat Med 2020). These three aspects have differences in vivo and in vitro. (Line347353)

      Material and Methods

      -  What are the concentrations of all medium components?

      Thanks to your suggestions. The concentrations of all medium components have now been refined in Table S1.

      -  Authors mention 10x while Smartseq2 is mentioned in Dataset S7?

      Thanks for your careful review. Single cell transcriptome sequencing in this study was done using 10X Genomics. Smartseq2 was used to sequence the transcriptome of a gland and its surrounding cells, which can be regarded as small bulk RNA sequencing. A small number of cells are utilized in Smartseq2 to construct a full-length mRNA library with enhanced transcript sequencing coverage, making it particularly well-suited for small-scale samples such as organoids.

      The data in Dataset S7 are acquired from small bulk RNA-seq with Smartseq2.  

      Reviewer #2 (Recommendations For The Authors):

      Q1: The theoretical choice of extra reagents added to the WOI organoids culture (PRL, hCG, and hPL) is theoretically justified, but not experimentally. On what previous studies, or performed experiments, are the choice of conditions used based?

      When selecting hormone formulations, multiple group comparisons were made. It was found that the number, area, and average intensity of organoids in these groups were similar over time. But the WOI organoids showed endometrial receptivity related gene expression profile, which highly expressed genes positively correlated with endometrial receptivity, and lowly expressed genes negatively correlated with receptivity, compared to the other hormone formulations (added to Figure S1E, S1F). Hormone dosage was primarily based on peri-pregnant maternal body or localized endometrium levels (Margherita Y. Turco et al., Nature Cell Biology 2017).

      Q2: Text in line 111 indicates that "stromal cells formed an extensive network", but vimentin fluorescence is not present on any image surrounding organoids in that figure. This assertion could only be supported by the subsequent results in Figure 2B. In addition, it is not indicated what kind of organoids have been used for these experiments

      The stromal cells arranged around the glands in the 3D structure (as shown in Figure 1C and Figure 2B), where bright-field high magnification photography, clearing staining of the organoids, and light microscopy imaging were used, respectively. However, there are many steps of fixation, embedding, staining and elution during the immunostaining of sections. It is difficult to preserve the arrangement and morphology of the stromal cells in the slice, so the stromal cells were not intentionally captured in the other images.  

      Figure 1C and Figure 2B are both CTRL organoids, which are now noted in the corresponding figure legend section.  

      Q3: It is not clear how glycogen secretion into the lumen is assessed in Figure 1D.

      Glycogen from the subnuclear region of the glandular cells gradually reaches the top of the cells, i.e., the supranuclear region, and is discharged into the glandular lumen as parietal plasma secretion. Glycogen-containing eosinophilic secretion can be seen in the glandular lumen in Figure1D.

      Q4: Assertions about differences in proliferation between groups are purely subjective; some kind of measurement and analysis would be necessary to be sure that there is differential proliferation based on Figure 1B.

      We are extremely grateful to you for pointing out this problem. We quantitatively analyzed the size of organoids in the three groups. The area was found to be increasing over time, with the three groups growing the most vigorously in the CTRL group, followed by the SEC group and the WOI group, but the differences were not statistically significant. Relevant results have been added to Figure S1E (Line130-131).

      Q5: For progesterone receptor expression analysis organoids are cultured for fourteen days. What is the basis for this change in culture time? 

      The choice of time point here is based on the secretary period of 14 days in the female menstrual cycle, when the endometrium is stimulated by estrogen and progesterone to maximized

      level.

      Q6: "n" number of individuals analysed through single-cell transcriptomics is not indicated.

      One patient's endometrium was simultaneously constructed into CTRL, SEC and WOI organoids, which were then subjected to single-cell transcriptome sequencing. This is described in the Supporting Information (Line 141-142).

      Q7: Where does the classification of EMT-derived stromal cells come from?

      EMT is a common and crucial biological event in the endometrium during the implantation window. During the EMT process, epithelial cells lose their epithelial characteristics while gaining migratory and invasive properties of fibroblasts.

      This cluster of cells expresses both epithelium markers CDH1 and EPCAM, and specifically expresses high levels of the EMT-related stromal cell markers AURKB, HJURP and UBE2C. During endometrial EMT, AURKB upregulates MMP2, VEGFA/Akt/mTOR and Wnt/β-catenin/Myc pathways to induce EMT (Zhen Wang et al., Cancer Manag Res 2020). HJURP also activates Wnt/β-catenin signaling to promote EMT (Y Wei et al., Eur Rev Med Pharmacol Sci 2019, Tianchi Chen et al., Int J Biol Sci 2019). UBE2C is upregulated by estrogen to promote EMT (Yan Liu et al., Mol Cancer Res 2020). Therefore, this cluster was defined as "EMT-derived stromal cells”.

      Q8: In the endometrial receptivity test (ERT), endometrium sample data matches with prereceptive endometrium and WOI organoids data matches with a receptive endometrium, but why there is no information about CTRL and SEC organoids?

      We performed ERT on these samples at a time when our hospital has a cooperative project with Yikon Genomics (Jiangsu, China). However, only endometrium and WOI organoids were sent for testing due to the limited quotas. Considering the end of cooperation and batch effect, no more CTRL and SEC organoids were tested. Moreover, the current ERT is a machine learning model based on the sequencing data of endometrium samples. But there are still differences in cellular composition between endometrial organoids and endometrium. Thus, the results need to be interpreted in conjunction with other results.

      Q9: When analysing the transcriptome and proteome, some comparisons are made between WOI vs CTRL and SEC, or just WOI vs CTRL. It would be interesting to have all the comparisons since the power of WOI organoids lies in their differences with SEC organoids.

      Thanks for your suggestion. At the organoid level, the differences in transcriptome and proteome between SEC and WOI organoids are not significant. This is understandable because WOI organoids are further induced towards the implantation window based on the secretory phase (i.e. SEC organoids), which prompted us to continue exploring at the single-cell level.

      Q10: Electron microscopy comparisons with respect to pinopods, cilia, and microvilli are only performed between WOI and CTRL. It would be interesting to check it with SEC.

      We now quantitatively compared the presence of various characteristic structure like microvilli, cilia, pinopodes and glycogen in the CTRL, SEC and WOI organoids. It was found that WOI organoid had longer microvilli and increased cilia, glycogen, and pinopodes (Figure 2H).

      Q11: Line 190 states that pinopods are arranged more densely in WOI organoids than in CTRL organoids. Seems to be a subjective observation. Is there an objective method to quantify this?

      We agree with the reviewer’s suggestion and quantified the pinopodes. The CTRL, SEC and WOI organoids were found to have increasing numbers of pinopodes, with WOI organoid owning the most abundant pinopodes. (Figure 2H) (Line184-186)

      Q12: Some characteristics are very similar between WOI and SEC organoids (such as the accumulation of secretory epithelium or decreased proliferative epithelium, the increased ciliated epithelium after hormonal treatment, or the presence of EMT-derived stromal cells). The authors should complement the discussion by objectively justifying the use of WOI versus SEC organoids. Would they be useful in more specific cases or at a general level when studying implementation?

      Thanks for your comments. WOI organoids are differentiated from SEC organoids towards the implantation window. Therefore, WOI organoids are suitable for studying periimplantation physiological changes or exploring pathological mechanisms. SEC organoids can be used when studying only a range of pathological problems such as endometrial secretory phase changes or hormone reactivity. (Line 365-368)

      Q13:ExM media is described in Table S1, but it does not include the concentration of the different reagents in the culture medium, which is the most interesting data about the ExM medium.

      Thanks to your suggestions. The concentrations of all medium components have now been refined in Table S1.

      Q14: It is not specified which organoid pass is used in each experiment. Is it always the same pass?

      Our experiments were conducted using P1~P3 generation endometrial organoids, as specified in the “Supporting Information” Line 54~55.

      Q15: As a protocol for freezing organoids is included in materials and methods, do the authors use freshly cultured organoids or do they cryopreserve them and thaw them for culturing?

      Thanks for your question. We used freshly cultured organoids in the manuscript. We listed the freezing protocol to illustrate that the constructed organoids can be frozen and recovered for special experimental needs and the establishment of sample banks.

      Q16: The most important point: Neither of the two studies that developed human endometrial organoids from tissue biopsies (Boretto et al. 2017 and Turco et al. 2017), observed stromal cell growth in culture. They disappeared between the first and second pass (as indicated by Turco et al. 2017). How do the authors justify the presence of stromal cells in their organoid culture if they rely on the protocols previously described by these research groups? If it is the case that they can only use the initial pass (freshly planted cells from endometrium), it does not make sense to include the freezing of the different passes in materials and methods, since the expansion capacity of the culture would be lost, which implies a major limitation of the model.

      Thanks for your question.  

      (1) We did not completely follow the protocols of these research groups. To maximize the recovery of both epithelial and stromal cells, we optimized key steps such as tissue digestion and cell strainer filtration. We shortened the digestion time to 20 minutes to protect cells from the digestion solution and retain some cell aggregates, which are beneficial for maintaining cell stemness and preserving stromal and immune cells cluster. The 40 μm filter membrane was used to isolate the endometrial cells, which may acquire both epithelial, and stromal cells.

      (2) Our experiments were conducted using P1~P3 generation of freshly constructed organoids. However, we also used recovered organoids when fresh endometrial samples were not available due to the COVID-19 epidemic. It was found that the organoids (e.g., P0~P5) still exhibited vigorous growth condition after recovery and could continue to be cultured by passaging (shown as below).

      The recovered organoids can be used for special experiments and biobank establishment.

      Author response image 10.

      The endometrial organoids of different passages were observed before cryopreservation and after recovery. Scale bar = 200 μm.

      Q17: It is not clear which organoids include Figure S2F. Does it include the three types of organoids or just WOI organoids?

      This circle diagram showed the functions of upregulated genes in the WOI group compared to CTRL group from combined transcriptome and proteome analysis, which has been labeled in the figure legend section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1)  Regarding the cell studies of human pediatric bone-derived osteoblast-like cells (HBO), the authors should provide a rationale for their selection of specific cell lines (15,16, 17, 19, 20, 23, 24) in this study. As for animal studies, could the authors clarify which cell lines were utilized in the murine in vivo experiments?

      We appreciate the opportunity to address this. To reduce confusion, we have numbered the patient primary cell lines used in these studies sequentially from 1 – 7. Additionally, we have added “HBO cell lines used for experiments were selected based on the ability of the primary cell line to proliferate and mineralize in culture” to the Methods section. 

      In vivo experiments: “HBO cell lines 2, 6 and 7 from separate individuals were selected for these experiments based on similar growth and passage characteristics.” This statement is included in the Methods section.

      (2)  In this study, the authors performed the murine in vivo experiments using both male and female mice. Could the author clarify if any difference was observed between male and female mice in the findings? This information would contribute to a more comprehensive understanding of the study.

      We agree and have added the following to the Results section: “There was no sex-based difference in regenerated bone volume.”

      (3)  Although the histological results showed an elevated collagen expression in mice treated with BMP2, JAG1, and JAG1 + DAPT compared to those treated with the cells alone, the differences among groups were subtle. The authors should consider the immunohistochemical (IHC) staining for collagen 1 on the samples, allowing for a quantitative assessment of collagen 1 expression.

      Thank you for this comment. The differences between BMP2, JAG1, and JAG1 + DAPT are indeed subtle. We have added Supplementary Figure 5, showing collagen staining of sections from the same FFPE blocks that were sectioned and stained with Masson Trichrome in Figure 2C. 

      Minor Comments:

      (4)  Please specify which cell lines are represented in the staining results shown in Fig.1A and Fig. 5A, respectively.

      In Fig 1A the representative images are of HBO2. Fig 5A representative images are of HBO7. We have added this information to the figure legends for these figures. 

      (5)  There appears to be a discrepancy in the specified size of the critical defect. The manuscript states that the size is 4mm, while Supplemental Figure 3 indicates 3.5mm.

      Thank you for this catch! Yes, it should be 4mm. This has been corrected in Supplementary Figure 3.

      (6)  The scale bar for Figure 2 C is missing.

      Scale bars have been added which also gave us an opportunity to brighten the images equally, allowing for better distinction between the different colors of the Masson Trichrome staining.

      (7)  In the methodological section 2.5 for JAG1 delivery, it would be helpful if the authors could review the initial dosage of JAG1 delivery to confirm if HBO cells were included or not, given that the MicroCT results indicate that all groups incorporated HBO cells. 

      We appreciate this suggestion. In response to another question, we have added Supplementary Figure 4 which includes an “Empty Defect” condition with no HBO cells, making the original method statement accurate.

      Reviewer #2 (Recommendations For The Authors):

      In the current study, using in vitro and in vivo models the authors clearly show that JAG1 can enhance osteogenesis and thus can be helpful in designing new therapeutic approaches in the field of bone regenerative research. The in vivo mouse CF model is very convincing and shows that JAG1 promotes osteogenesis via non-canonical signaling. Mechanistically it seems that JAG1 activates STAT5, AKT, P38, JNK, NF-ĸB, and p70 S6K. However, additional evidence is needed to convincingly conclude that all the non-canonical pathways activated via JAG1 converge at p70 S6K activation. The following concerns need to be addressed.

      (1) In Fig 1A: Even though the Jag1-Fc shows a very significant increase in HBO mineralization, there are no significant increases in cells in osteogenic media when compared to control growth media. Even though the different conditions were subjected to RNAseq analysis in the later figures, qPCR analysis of some osteogenic genes in Figure 1 might be helpful. 

      We appreciate the opportunity to explore this question further. We conducted mineralization experiments in triplicate and performed qRT-PCR, assessing for gene expression of 5 osteogenic genes: ALPL, BGLAP (osteocalcin), COL1A1, RUNX2, and SP7. Results are shown in Figure 1C and this text was added to Results: “Additionally, PCR analysis of HBO1 cells from a repeat experiment collected at days 7, 14, and 21 showed significantly increased expression of osteogenic genes with JAG1-bds stimulation (Figure 1C). ALPL was significantly expressed at Day 7, with a 3.5-fold increase (p=0.0004) compared to HBO1 cells grown in growth media. In contrast, significant expression levels of COL1A1 and BGLAP were observed at 14 days, with a 5.1-fold increase (p=0.0021) of COL1A1 and a 12.3-fold increase (0.0002) of BGLAP when compared to growth media conditions. Interestingly, while some mineralization is observed in the osteogenic media and Fc-bds

      (Figure 1A) conditions, there were no significant increases in osteogenic gene expression (Figure

      1C). Expression of RUNX2 and SP7 was not significantly altered across all conditions and time points (not shown).”

      (2) In Fig 2: even though not needed in respect to the hypothesis, was there any Control group without any cells or JAG1 beads? What were the changes in between that group and cells cells-only group?

      We have not observed differences between the “Empty Defect” group and the “Cells alone” group.

      We have addressed the reviewer’s comments by adding this comparison in Supplementary Figure 4.

      (3) Transcriptional profiling and ELISA (Fig 3 and 4) show upregulation of NF-ĸB signaling in response to JAG1. In the discussion, the authors have referenced a previous study showing NF-ĸB as prosurvival in human OB cells. However, based on many published reports, NF-ĸB activation has been shown to inhibit OB function. Does JAG1 regulate HBO cell survival via NF-ĸB activation?

      Experimenting using NF-ĸB inhibitor can be helpful to show that JAG1 mediates NF-ĸB activation is anabolic in this experimental setup.

      We thank the reviewer for this excellent suggestion. We are eager to explore this new direction for our research in a subsequent study. We have added this to our future directions. 

      (4) Fig 5: 

      (A)  Condition showing JAG1+ DAPT is needed to compare between JAG1 canonical and noncanonical signaling. 

      Thank you for pointing this out. We have added Supplementary Figure 6, which includes a dose response experiment for JAG1 + DAPT.

      (B)  S6K18 alone seems to be increasing OB mineralization. Is that statistically significant?  

      No, and we have added the statistical analysis for S6K-18 to Figure 5B.

      (C)  Fc alone condition seems to have a very significant increase in OB mineralization. Does Fc alone upregulate OB function? 

      We do see some upregulation of mineralization with Fc in vitro, which we also observed in our previous studies with mouse neural crest cells, but we have not found it to be osteogenic in vivo. We have added a statement to this effect, with references. Additionally, osteogenic gene expression was not upregulated in our in vitro mineralization experiments with Fc.  See Revised Figure 1.

      (D)  Although overall quantification shows that S6K18 partially inhibits HBO mineralization, the representative images do not represent the quantification. Transcriptional analysis (qPCR) is required to validate these findings.

      We performed qRT-PCR on cells from a repeat mineralization assay, collecting cells at 9, 14, and 21 days. We have added the following to the Results:” While inhibition of NOTCH and p70 S6K decreased mineralization in our mineralization assay, there are no statistically significant changes in gene expression for ALPL, COL1A1, or BGLAP (Supplementary Figure 7). These results suggest that the HBO cells phenotypes are maturing into osteocytes and that inhibiting p70 S6K hinders the cellular ability to mineralize but not the cell phenotype progression.”

      (5) Finally, to convincingly conclude the data from Fig 5, the mouse CF model can be helpful to support the authors' claim that JAG1 acts via p70 S6K.

      Thank you for this feedback. We have modified our conclusions to reflect that p70 S6K is one of the non-canonical pathways that JAG1 may be activating in bone regeneration.

      Thank you very much for your consideration of our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, proteomics analysis of the plasma of human subjects that underwent an exercise training regime consisting of a combination of endurance and resistance exercise led to the identification of several proteins that were responsive to exercise training. Confirming previous studies, many exercise-responsive secreted proteins were found to be involved in the extra-cellular matrix. The protein CD300LG was singled out as a potential novel exercise biomarker and the subject of numerous follow-up analyses. The levels of CD300LG were correlated with insulin sensitivity. The analysis of various open-source datasets led to the tentative suggestion that CD300LG might be connected with angiogenesis, liver fat, and insulin sensitivity. CD300LG was found to be most highly expressed in subcutaneous adipose tissue and specifically in venular endothelial cells. In a subset of subjects from the UK Biobank, serum CD300LG levels were positively associated with several measures of physical activity - particularly vigorous activity. In addition, serum CD300LG levels were negatively associated with glucose levels and type 2 diabetes. Genetic studies hinted at these associations possibly being causal. Mice carrying alterations in the CD300LG gene displayed impaired glucose tolerance, but no change in fasting glucose and insulin. Whether the production of CD300LG is changed in the mutant mice is unclear.

      Strengths:

      The specific proteomics approach conducted to identify novel proteins impacted by exercise training is new. The authors are resourceful in the exploitation of existing datasets to gain additional information on CD300LG.

      Weaknesses:

      While the analyses of multiple open-source datasets are necessary and useful, they lead to relatively unspecific correlative data that collectively insufficiently advance our knowledge of CD300LG and merely represent the starting point for more detailed investigations. Additional more targeted experiments of CD300LG are necessary to gain a better understanding of the role of CD300LG and the mechanism by which exercise training may influence CD300LG levels. One should also be careful to rely on external data for such delicate experiments as mouse phenotyping. Can the authors vouch for the quality of the data collected. 

      Thank you for the valuable feedback on our manuscript. We recognize concerns about the specificity of correlative data from open-source datasets and the limitations it presents for understanding CD300LG's role. To address this, we have expanded the manuscript with a paragraph in the discussion regarding the need of targeted experiments confirm CD300LG’s functions and relationship with glucose metabolism. We also emphazise caution regarding external data reliance and we acknowledge the need for generating primary data including direct phenotyping of mice with CD300LG gene alterations to better understand its regulatory mechanisms and effects on glucose tolerance. Please see lines 446-456.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript from Lee-Odegard et al reports proteomic profiling of exercise plasma in humans, leading to the discovery of CD300LG as a secreted exercise-inducible plasma protein. Correlational studies show associations of CD300LG with glycemic traits. Lastly, the authors query available public data from CD300LG-KO mice to establish a causal role for CD300LG as a potential link between exercise and glucose metabolism. However, the strengths of this manuscript were balanced by the moderate to major weaknesses. Therefore in my opinion, while this is an interesting study, the conclusions remain preliminary and are not fully supported by the experiments shown so far.

      Strengths:

      (1) Data from a well-phenotyped human cohort showing exercise-inducible increases in CD300LG.

      (2) Associations between CD300LG and glucose and other cardiometabolic traits in humans, that have not previously been reported.

      (3) Correlation to CD300LG mRNA levels in adipose provides additional evidence for exercise-inducible increases in CD300LG.

      Weaknesses:

      (1) CD300LG is by sequence a single-pass transmembrane protein that is exclusively localized to the plasma membrane. How CD300LG can be secreted remains a mystery. More evidence should be provided to understand the molecular nature of circulating CD300LG. Is it full-length? Is there a cleaved fragment? Where is the epitope where the o-link is binding to CD300LG? Does transfection of CD300LG to cells in vitro result in secreted CD300LG?

      (2) There is a growing recognition of specificity issues with both the O-link and somalogic platforms. Therefore it is critical that the authors use antibodies, targeted mass spectrometry, or some other methods to validate that CD300LG really is increased instead of just relying on the O-link data.

      (3) It is insufficient simply to query the IMPC phenotyping data for CD300LG; the authors should obtain the animals and reproduce or determine the glucose phenotypes in their own hands. In addition, this would allow the investigators to answer key questions like the phenotype of these animals after a GTT, whether glucose production or glucose uptake is affected, whether insulin secretion in response to glucose is normal, effects of high-fat diet, and other standard mouse metabolic phenotyping assays.

      (4) I was unable to find the time point at which plasma was collected at the 12-week time point. Was it immediately after the last bout of exercise (an acute response) or after some time after the training protocol (trained state)?

      We acknowledge the importance of understanding the molecular form of CD300LG in circulation. We have expanded the discussion with a paragraph regarding the need of follow-up experiments on whether circulating CD300LG is full-length or a cleaved fragment, to identify the epitope for O-link binding, and assess CD300LG secretion in vitro through transfection experiments. We also discuss the need of targeted mass spectrometry and antibody-based validation of O-link measurements of CD300LG, and the need for more validation experiments on CD300LG-deficient mice. Please see lines 446-456.

      The plasma collected post-intervention is in a state that reflects the new baseline trained condition of the subjects, 3 days after the last exercise session during the intervention. We have clarified this in our manuscript. The information is updated in line 491-493.

      Reviewer #1 (Recommendations For The Authors):

      In the present form, the paper raises interest in the potential role of CD300LG in the response to exercise training but unfortunately does not provide clear answers. The authors should focus their efforts on firmly validating the status of CD300LG as an exercise biomarker in humans and carefully examine the function of CD300LG through mechanistic and animal-based studies.

      The authors are encouraged to acquire CD300LG-deficient mice and perform specific experiments to validate hypotheses forthcoming from the analysis of the open-source datasets. In addition, it needs to be validated that the cd300lgtm1a(KOMP)Wtsi mice are actually deficient in CD300LG. It is not uncommon that Tm1a mice have (almost) normal expression of the targeted gene.

      We have now revised the manuscript and added a new section to the discussion regarding the limitations with open-source data, cd300lgtm1a(KOMP)Wtsi mice and the need for more validation experiments on CD300LG-deficient mice. Please see lines 446-456.

      The value of the correlative data presented in Figure 5 is rather limited. The same can be argued for the data presented in Supplementary Figure 2. If CD300LG is expressed in endothelial cells, it stands to reason that its expression is correlated with angiogenesis. Hence, this observation does not really carry any additional value.

      We agree that correlations cannot imply causality. However, similar patterns were observed in several tissues and across different data sets, which at least suggest a role CD300LG related to angiogesis. We have included a section in the discussion were we clarify that our observations should only be regarded as indications and that follow-up studies are needed to confirm any causal role for CD300LG on angiogenesis/oxidativ capacity. Please see lines 446-456.

      Figure 6 may be better accommodated in the supplement.

      Figure 6 is now moved to the supplement.

      Figure 3A and B are a bit awkward. The description "no overlap" is confusing. Isn't it more accurate to say "no enrichment" or "no over-representation"? There will always be some overlap with certain pathways. However, there may be no enrichment. Furthermore, the use of arrows to indicate No overlap is visually not very appealing. Maybe the numbers can be given a specific color?

      We have now removed the arrows and text, and rather stated in the text that there were no enrichements other than for the proteins down-regulated in the overweight group.

      The description of the figure legend of figure 5E-H is incomplete.

      The description is now completed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors attempt to fully characterize the immunoglobulin (Ig) heavy (H) chain repertoire of tumor-infiltrating B cells from three different cancer types by identifying the IgH repertoire overlap between these, their corresponding draining lymph nodes (DLNs), and peripheral B cells. The authors claim that B cells from tumors and DLNs have a closer IgH profile than those in peripheral blood and that DLNs are differentially involved with tumor B cells. The claim that tumor-resident B cells are more immature and less specific is made based on the characteristics of the CDR-H3 they express.

      Strengths:

      The authors show great expertise in developing in-house bioinformatics pipelines, as well as using tools developed by others, to explore the IgH repertoire expressed by B cells as a means of better characterizing tumor-associated B cells for the future generation of tumor-reactive antibodies as a therapy.

      Weaknesses:

      This paper needs major editing, both of the text and the figures, because as it stands it is convoluted and extremely difficult to follow. The conclusions reached are often not obvious from the figures themselves. Sufficient a priori details describing the framework for their analyses are not provided, making the outcome of their results questionable and leaving the reader wondering whether the findings are on solid ground.

      The authors are encouraged to explain in more detail the premises used in their algorithms, as well as the criteria they follow to define clonotypes, clonal groups, and clonal lineages, which are currently poorly defined and are crucial elements that may influence their results and conclusions.

      In response to this comment, we significantly expanded the paragraph dedicated to the tumor and non-tumor repertoire overlap and isotype composition. The following sections were added:

      First, we characterized the relative similarity of IGH repertoires derived from tumors, DLN, and PBMC on the individual CDR-H3 clonotype level. We define clonotype as an instance with an identical CDR-H3 nucleotide sequence  and identical V- and J- segment attribution (isotype attribution may be different). Unlike other authors, here we do not pool together similar CDR-H3 sequences to account for hypermutation. (Hypermutation analysis is done separately and defined as clonal group analysis. )

      As overlap metrics are dependent on overall repertoire richness, we normalized the comparison using the same number of top most frequent clonotypes of each isotype from each sample (N = 109). Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point.

      We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      Having excluded the IGHD gene segment from some of their analyses (at least those related to clonal lineage inference and phylogenetic trees), it is not well explained which region of CDR-H3 is responsible for the charge, interaction strength, and Kidera factors, since in some cases the authors mention that the central part of CDR-H3 consists of five amino acids and in others of seven amino acids.

      We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated  for consistency and the parameters depicted there are now calculated using 5 central amino acids, as in other sections.

      How can the authors justify that the threshold for CDR-H3 identity varies according to individual patient data? 

      Ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a cluster/clonal group  would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). Or, in other words, the more the sequencing depth, the more dense the clusters will be. The method of individual threshold tailoring relies on the following: https://changeo.readthedocs.io/en/latest/examples/cloning.html

      Although individual kidera factors that are significant in the context of our analysis are described in the text one by one on their first appearance, we now also added a sentence to describe Kidera factor analysis in general (page 8):

      Kidera factors are a set of scores which quantify physicochemical properties of protein sequences (Nakai et al. 1988). 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques.

      Throughout the analyses, the reasons for choosing one type of cancer over another sometimes seem subjective and are not well justified in the text.

      Whenever possible, we pooled all patients with all cancer types together, because the number of available samples did not allow us to draw any significant conclusions comparing between individual cancer types. When analyzing and showing individual patient data, we also did not attempt to depict any cancer-type-specific findings, but it is inevitable that we name a specific cancer type when labelling a sample coming from a specific tumor.

      Overall, the narrative is fragmented. There is a lack of well-defined conclusions at the end of the results subheadings.

      In addition to the described above, a conclusion was added to the paragraph describing hypermutation analysis:

      IGHG clonotypes from lung cancer samples show higher number of hypermutations, possibly reflecting high mutational load found in lung cancer tissue. For melanoma, another cancer known for high mutational load, no statistically significant difference was found. This may be due to higher variance between melanoma samples, which hinders the analysis, or due to the small sample size.

      The exact same paragraph is repeated twice in the results section.

      Corrected.

      The authors have also failed to synchronise the actual number of main figures with the text, and some panels are included in the main figures that are neither described nor mentioned in the text  (Venn diagram Fig. 2A and phylogenetic tree Fig. 5D). Overall, the manuscript appears to have been rushed and not thoroughly read before submission.

      Corrected.

      Reviewers are forced to wade through, unravel, and validate poorly explained algorithms in order to understand the authors' often bold conclusions.

      We hope that the aforementioned additions to the text and also addition to the Figure 1 make the narrative more easily understandable.

      Reviewer #2 (Public Review):

      Summary:

      The authors sampled the B cell receptor repertoires of Cancers, their draining lymph nodes, and blood. They characterized the clonal makeup of all B cells sampled and then analyzed these clones to identify clonal overlap between tissues and clonal activation as expressed by their mutation level and CDR3 amino acid characteristics and length. They conclude that B cell clones from the Tumor interact more with their draining lymph node than with the blood and that there is less mutation/expansion/activation of B cell clones in Tumors. These conclusions are interesting but hard to verify due to the under-sampling and short sequencing reads as well as confusion as to when analysis is across all individuals or of select individuals.

      Strengths:

      The main strength of their analysis is that they take into account multiple characteristics of clonal expansion and activation and their different modes of visualization, especially of clonal expansion and overlap. The triangle plots once one gets used to them are very nice.

      Weaknesses:

      The data used appears inadequate for the conclusions reached. The authors' sample size of B cells is small and they do not address how it could be sufficient. At such low sampling rates, compounded by the plasmablast bias they mention, it is unclear if the overlap trends they observe show real trends. Analyzing only top clones by size does not solve this issue. As it could be that the top 100 clones of one tissue are much bigger than those of another and that all overlap trends are simply because the clones are bigger in one tissue or the other. i.e there is equal overlap of clones with blood but blood is not sufficiently sampled given its greater diversity and smaller clones.

      Regarding the number of clonotypes to be taken into account,  we were limited by the B cell infiltration of tumor samples and our ability to capture their repertoire. However, we use technical replicates on the level of cell suspension to ensure that at least top clonotypes are consistently sampled. So, this is how the data should be interpreted - as describing the most abundant clones in the repertoire (which also may be considered the most functionally relevant in case of tumor infiltrating lymphocytes).

      To analyze the repertoire overlap, we generally use the F2 metric that takes clone size into account - because we think that clone size is an important functional factor. However, we have now added the description of using D metric (does not include clone frequency as a parameter) - which shows exactly the same trend as F2 metric. So, both F2 and D overlap metrics support our conclusion of higher overlap between tumor and LN.

      The following text was added:

      We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of lymph nodes than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      All in all, of course, the deeper the better, but given the data we were able to generate from the samples, this was the best approach to normalization that could be used.

      Similarly, the read length (150bp X2) is too short, missing FWR1 and CDR1 and often parts of FWR2 if CDR3 is long. As the authors themselves note (and as was shown in (Zhang 2015 - PMC4811607) this makes mutation analysis difficult.

      Indeed, we are aware of this problem, and therefore only a small part of the manuscript is dedicated to the hypermutation analysis. However, as the CDR-H3 region is the most mutated part, we still can capture significant diversity of mutations. To address the question of applicability of our data for the hypermutation phylogeny analysis, we compare the distribution of physico-chemical properties along the trees of hypermutation using the 150+150 and 300+300 data from the same donor and the same set of samples. The main conclusion is that neither for long, nor for short datasets could any correlation of physicochemical properties of the CDR-H3 region with the rank of the clonotype on the tree be found.  

      It also makes the identification of V genes and thus clonal identification ambiguous. This issue becomes especially egregious when clones are mutated.

      Again, this would be important for clonotype phylogeny analysis. However, for the simple questions that we address with our clonal group analysis, such as clonal group overlap between tissues etc, we consider this data acceptable, because if any mislabelling of V segment occurs, it is a) rare and b) is equally frequent in all types of samples. Therefore, any conclusions made are still valid despite this technical drawback.

      To directly address the question of mislabelling of V-genes in our data, we looked at the average number of different  V-genes attributed to the same nucleotide sequence of CDR-H3 region in the short (150+150) and long (300+300) datasets from the same donor. Indeed, some ambiguity of V-gene labelling is observed (see below), but we think that it is unlikely to influence any of our cautious conclusions.

      Author response image 1.

      Finally, it is not completely clear when the analysis is of single individuals or across all individuals. If it is the former the authors did not explain how they chose the individuals analyzed and if the latter then it is not clear from the figures which measurements belong to which individual (i.e they are mixing measurements from different people).

      We addressed this issue by adding a comment to each figure caption, describing whether a particular figure or panel describes individual or pooled data, and also whether the analysis is done on individual clonotype or clonal group level.

      Also, in case pooled data were used, we added the number of patients that was pooled for a particular type of analysis. This number differs from one type of analysis to the other, because not all the patients had a complete set of tissues, and also not all samples passed a quality check for a particular analysis.

      Here are the numbers listed:

      Fig 2A: N=6 (we were only considering those who had all three tissues)

      Fig 2C, N=14 (all)

      2D: N=14 (all)

      2E N=7 (have both tum and PBMC).

      2F N=9 (have both tum and PBMC).

      2G N=9 (have both tum and PBMC)

      2H N=7 (have both tum and LN)

      3A N=14 (all)

      3B N=11 (only those with tumor)

      3E - N=14

      7F N=11 (all that have tumor)

      Reviewer #3 (Public Review):

      In multiple cancers, the key roles of B cells are emerging in the tumor microenvironment (TME). The authors of this study appropriately introduce that B cells are relatively under-characterised in the TME and argue correctly that it is not known how the B cell receptor (BCR) repertoires across tumors, lymph nodes, and peripheral blood relate. The authors therefore supply a potentially useful study evaluating the tumor, lymph node, and peripheral blood BCR repertoires and site-to-site as well as intra-site relationships. The authors employ sophisticated analysis techniques, although the description of the methods is incomplete. Among other interesting observations, the authors argue that the tumor BCR repertoire is more closely related to that of draining lymph node (dLN) than the peripheral blood in terms of clonal and isotype composition. Furthermore, the author's findings suggest that tumor-infiltrating B cells (TIL-B) exhibit a less mature and less specific BCR repertoire compared with circulating B cells. Overall, this is a potentially useful work that would be of interest to both medical and computational biologists working across cancer. However, there are aspects of the work that would have benefitted from further analysis and areas of the manuscript that could be written more clearly and proofread in further detail.

      Major Strengths:

      (1) The authors provide a unique analysis of BCR repertoires across tumor, dLN, and peripheral blood. The work provides useful insights into inter- and intra-site BCR repertoire heterogeneity. While patient-to-patient variation is expected, the findings with regard to intra-tumor and intra-dLN heterogeneity with the use of fragments from the same tissue are of importance, contribute to the understanding of the TME, and will inform future study design.

      (2) A particular strength of the study is the detailed CDR3 physicochemical properties analysis which leads the authors to observations that suggest a less-specific BCR repertoire of TIL-B compared to circulating B cells.

      Major Weaknesses:

      The study would have benefitted from a deeper biological interpretation of the data. While given the low number of patients one can plausibly understand a reluctance to speculate about clinical details, there is limited discussion about what may contribute to observed heterogeneity.

      We indeed do not want to overinterpret our data, especially where it comes to the difference between types of cancer. On the other hand, extracting similar patterns between different cancer types allows to pinpoint mechanisms that are more general and do not depend on cancer type. As for the potential source of intratumoral heterogeneity that we observe, we think that it may be coming from the selective sampling of tertiary lymphoid structures. We include IHC data for TLS detection in the supplementary Fig.5.  Also, tumor mutation clonality may correlate with differential antibody response (i.e. different IGH clonotypes developing to recognize different antigens) – as has been previously described for TCRs by the lab of B.Chain in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6890490/.

      For example, for the analysis of three lymph nodes taken per patient which were examined for inter-LN heterogeneity, there is a lack of information regarding these lymph nodes.

      Unfortunately no clinical information about the lymph nodes was available.

      'LN3' is deemed as exhibiting the most repertoire overlap with the tumor but there is no discussion as to why this may be the case.

      The following phrases describes this in the “LN-to-LN heterogeneity in colorectal cancer” paragraph:

      Similarly, an unequal interaction of tumors with DLNs was observed at the level of hypermutating clonal groups.

      Functionally, this may again indicate that within a group of DLNs, nodes are unequal in terms of access to tumor antigens, and this inequality shapes the BCR repertoires within these lymph nodes.

      (2) At times the manuscript is difficult to follow. In particular, the 'Intra-LN heterogeneity' section follows the 'LN-LN heterogeneity in colorectal cancer' section and compares the overlap of LN fragments (LN11, LN21, LN31) with the tumor in two separate patients (Fig 6A). In the previous section (LN-LN), LN11, LN21, LN31 are names given to separate lymph nodes from the same patient. The fragments are referred to as 'LN2' and the nodes in the previous section are referred to similarly. This conflation of naming for nodes and fragments is confusing.

      We corrected this.

      (3) There is a duplicated paragraph in 'Short vs long trees' and the following section 'Productive involvement in hypermutation lineages depends on CDR3 characteristics.

      Corrected.

      Reviewer #1 (Recommendations For The Authors):

      - Figures:

      Figure 1A lacks resolution

      Corrected

      Figure 2A, Venn diagram: What do the colors indicate?

      Corrected

      Figure 5D, why include this tree when there is no mention of it in the text?

      Described

      Figures 8, 9, and 10 are not to be found. One should not have to figure out that they became supplementary in the end.

      Corrected

      Regarding the physicochemical properties of CDR-H3, what do the authors mean by "the central part"? Do the authors refer to the CDR-H3 loop, and if so, how is that defined when the IGHD gene segment is excluded from the analyses? Is it 5 amino acids (Productive involvement in hypermutating lineages depends on CDR3 characteristics, Page 21/39 in merged document) and (CDR3 properties, Page 8/39 in merged document), or 7 amino acids (Short vs long trees phylogeny analysis, Page 19/39 in merged document)? Please clarify.  

      We considered different ways of calculating amino acid properties of CDR3 and used different parameters for sample-average and individual-sequence CDR3s. Now plots for Fig S6 C are updated for consistency. IGHD segment was not excluded from the analysis. The reviewer might be confused by our description of phylogenetic inference, when an artificial outgroup with D segment deleted is added to the clonal group to facilitate the inference process. All other sequences were analyzed in their original form with the D segment. This way, we could avoid biases in phylogeny introduced by misassignment of D gene germline to the outgroup.

      What was the threshold for CDR-H3 identity in their analyses? How can the authors justify that this value changes according to individual patient datasets? (Materials & methods, Clonal lineage inference Page 29/39 in merged document).

      As described earlier, ideal similarity threshold may depend on several factors, such as sampling, sequencing depth etc. For example, imagine a sample picking up 100% of the clonal lineage sequences which differ only 1 amino acid from each other, and a worse quality sample/sequencing picking up only every other sequence. Obviously, the minimal threshold required to accumulate these into a clonotype would be different for these two cases (1aa for the former, and ~2 aa for the latter for single-linkage clustering). The method of individual threshold tailoring relies on this: https://changeo.readthedocs.io/en/latest/examples/cloning.html

      What is the difference between tumor-induced and tumor-infiltrating B cells? How can the authors discriminate between the two? Page 6/39 in the merged document.

      corrected to tumor-infiltrating

      "Added nucleotides" meaning N additions? Page 3/39 in the merged document.

      yes

      How many cancer patients were enrolled? 17 or 14(Materials & methods page 27/39 in the merged document)? Please clarify.   

      In the current project 14 patients were enrolled. The appropriate changes have been introduced in the final text. Supplementary table 2 has been added with the patient data.

      Abbreviations are used without full descriptions.

      According to reviewer’s recommendation, a list of abbreviations was added in the manuscript, and also full descriptions were added in the text upon first mentioning of the term.

      Use either CDR3 or CDR-H3

      We corrected the text to use CDR-H3 abbreviation throughout the text.

      Reviewer #2 (Recommendations For The Authors):

      I would like to start by apologizing for the time it took me to review.

      As I mentioned above there are issues with the clonal sampling of the sequencing length and the statistics in this paper. From reading the paper I am not sure if they are fixable but there are some things that could be tried.

      (1) The authors mention the diversity of their individual analysis - 17 individuals across 3 cancer types, but do not then systematically show us how the different things they measure track across the different individuals and cancer types. it is possible that some trends would be more convincing if we saw them happening again and again across all individuals. But, as I said above, the authors do not identify individuals clearly across all their types of analysis nor do they explain why sometimes they show analysis of specific individuals.

      For overlap analysis (Fig. 2 except panel B), CDR3 properties analysis (Fig. 3, Fig. S7), clonal group analysis (Fig. 4) we used pooled data on all cancers, unless it is indicated otherwise on the panel. For overlap analysis, we used Cytoscape graph (Fig. 2B) for one patient, mp3, to illustrate the findings that were made on pooled data. For other types of analysis, such as overlap between individual lymph nodes, or tumor fragments (Fig. 5, 6, 7 except panel F) pooled analysis is not possible due to the individual nature of the processes in question.

      (2) The authors do not address how lacking their sampling is nor the distribution of clone sizes in different tissues/ individuals/ subsets. Without such a discussion it is not clear how tenuous or convincing their conclusions are.

      (3) The short sequencing lengths limit the ability to exactly identify V and thus the germline root of clones, whose positions are mutated and clonal association of sequences. The authors appear to be aware of this as they often use the most common ancestor as the start of their analysis... however, again there are inconsistencies that are not clearly described in the text. in creating trees with change they defined roots as the putative germline and at least in most cases also in clone association although in some analyses potentially similar clones were collapsed into clonotypes. Again it is not clear when one method was used or the other and how the choice was made what to choose.

      Here we can only state that we consistently used the approach described in the Methods section, which was the following:

      First, the repertoires were clustered into clonal lineages using the criteria described in “Methods: Clonal lineage inference” Assuming that each clonotype sequence in the clonal lineage originated from the same ancestor, we try to recover the phylogeny. Please note that we refer to the individual BCR sequences as “clonotypes”, and to a group of clonotypes that presumably share a common ancestor - as “clonal lineage” or “clonal group”.

      The phylogeny of B-cell hypermutations was inferred for each clonal lineage of size five or more using the maximum likelihood method and the GTR GAMMA nucleotide substitution model. To find the most recent common ancestor (MRCA) or “root” of the tree, we used an artificial outgroup constructed as a conjugate of germline segments V and J defined by MIXCR and added it to the clonal lineage. The D segment was excluded from the outgroup formation, as there was insufficient confidence in the germline annotations due to its short length and high level of mutations. The rest of the clonotypes were still analyzed in their original form with D segment in place. Deleting D segment from the outgroup simply eliminates the risk of biasing the phylogeny by missasigning D segment germline sequence to the outgroup. The MUSCLE tool was used for multiple sequence alignment and RAxML software was used to build and root phylogenetic trees.

      (4) Beyond the statistical issues mentioned above: the unclear selection of individual examples for comparison and significance testing, the mixing of individuals and cancer types without clear identification, etc. there is in general a lack of coherence in the statistical analysis performed. specifically:

      (a) the authors should choose one cutoff for significance (0.01 for instance) and then just mention when things are significant and when not. There is no need and it is confusing to add the p-value for every comparison. P-values are not good measures of effect size.

      We corrected the figures and left p-values only where they are below significance threshold.

      (b) the Bonferroni correction used is not well characterized. For an alpha of 0.01 in Figures 3 C and D how many tests were performed?

      The number of tests performed that was used for Bonferroni-Holm correction equals the number of comparisons on the heatmap which makes it 39 for each heatmap on Fig 3C and 13 for Fig 3D.

      Finally some minor issues -

      (1) Not all acronyms are described, for instance, TME and TIL. The first time any acronym is used it should be spelled out.  -> Katya B- список сокращений

      (2) The figure captions are not all there...

      (a) there is no caption for Figure 3E.

      corrected

      (b) there are Figure 7 F and G panels but no Figure 7E panel and Figure F is described after Figure G.

      corrected

      (3) A few problems with wording -

      (a) bottom paragraph of page 3 - instead of :

      "different lymph nodes from one draining lymph node pool may be more or less involved"

      Corrected to "different lymph nodes from one draining lymph node pool may be differentially involved"

      (b) figure caption for figure 3a: instead of:

      "CDR3 are on average significantly higher in tumor"

      Corrected to "CDR3 are on average significantly longer in tumor"

      Reviewer #3 (Recommendations For The Authors):

      - FIG1A - Suggest expanding the legend to include more information on the computational analyses.

      added

      - PAGE SIX: Suggest adding a table or some text on patient characteristics. Numbers of unique clonotypes per sample etc. Are there differences in age/sex that need to be considered? Some clonotype information is available in S1 but some summary and statistics would be appreciated.

      Added patient information as Supplementary table 2.

      - PAGE SIX: F2 Metric, suggestion to explain why this was used vs. other metrics.

      We expanded the following paragraph to include information about F2 metric and D metric, and the reason why we are using F2.

      Repertoire data for each sample were split according to the immunoglobulin isotype, and the F2 metric was calculated for each isotype separately and plotted as an individual point. We used the repertoire overlap metric F2 (Сlonotype-wise sum of geometric mean frequencies of overlapping clonotypes), which accounts for both the number and frequency of overlapping clonotypes (Fig. 2A). As expected, significantly lower overlaps were observed between the IGH repertoires of peripheral blood and tumors compared to LN/tumor overlaps. The LN/PBMC overlap also tended to be lower, but the difference was not statistically significant. We also analyzed D metric, which represents the relative overlap diversity uninfluenced by clonotype frequency (Dij\=dij/(di*dj), where dij is the number of clonotypes present in both samples, while di and dj are the diversities of samples i and j respectively). The results for D metric are not shown, as they indicate a similar trend to that of F2 metric. This observation allows us to conclude that tumor IGH repertoires are more similar to the repertoires of tumor-draining LNs than to those of peripheral blood, both if clonotype frequency is taken into account, and when it is not.

      - PAGE SIX: Make clear in the text that mp3 is a patient.

      Added “melanoma patient mp3”

      - PAGE EIGHT: Suggest explaining kidera factors at first use - not all readers will know what they are.

      We expanded the following paragraph to add more information about Kidera factors:

      To explore CDR-H3 physicochemical properties, we calculated the mean charge, hydropathy, predicted interaction strength, and Kidera factors 1 - 9 (kf1-kf9) for five central amino acids of the CDR-H3 region for the 100 most frequent clonotypes of each sample using VDJtools. Kidera factors are a set of scores which quantify physicochemical properties of protein sequences 61. 188 physical properties of the 20 amino acids are encoded using dimension reduction techniques, to yield 9 factors which are used to quantitatively characterize physicochemical properties of amino acid sequences.

      - Fig 5D is not referred to.

      Corrected

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      Kroeg et al. describe a novel method for 2D culture human induced pluripotent stem cells (hiPSCs) to form cortical tissue in a multiwell format. The method claims to offer a significant advancement over existing developmental models. Their approach allows them to generate cultures with precise, reproducible dimensions and structure with a single rosette; consistent geometry; incorporating multiple neuronal and glial cell types (cellular diversity); avoiding the necrotic core (often seen in free-floating models due to limited nutrient and oxygen diffusion). The researchers demonstrate the method's capacity for long-term culture, exceeding ten months, and show the formation of mature dendritic spines and considerable neuronal activity. The method aims to tackle multiple key problems of in vitro neural cultures: reproducibility, diversity, topological consistency, and electrophysiological activity. The authors suggest their potential in high-throughput screening and neurotoxicological studies.

      Strengths: 

      The main advances in the paper seem to be: The culture developed by the authors appears to have optimal conditions for neural differentiation, lineage diversification, and long-term culture beyond 300 days. These seem to me as a major strength of the paper and an important contribution to the field. The authors present solid evidence about the high cell type diversity present in their cultures. It is a major point and therefore it could be better compared to the state of the art. I commend the authors for using three different IPS lines, this is a very important part of their proof. The staining and imaging quality of the manuscript is of excellent quality.

      We thank the reviewer for the positive comments on the potential of our novel platform to address key problems of in vitro neural culture, highlighting the longevity and reproducibility of the method across multiple cell lines.

      Weaknesses: 

      (1) The title is misleading: The presented cultures appear not to be organoids, but 2D neural cultures, with an insufficiently described intermediate EB stage. For nomenclature, see: doi: 10.1038/s41586-022-05219-6. Should the tissue develop considerable 3D depth, it would suffer from the same limited nutrient supply as 3D models - as the authors point out in their introduction. 

      We appreciate the opportunity to clarify this point. We respectfully disagree that the cultures do not meet the consensus definition of an organoid. In fact, a direct quote from the seminal nomenclature paper referenced by the reviewer states: “We define organoids as in vitro-generated cellular systems that emerge by self-organization, include multiple cell types, and exhibit some cytoarchitectural and functional features reminiscent of an organ or organ region. Organoids can be generated as 3D cultures or by a combination of 3D and 2D approaches (also known as 2.5D) that can develop and mature over long periods of time (months to years).” (Pasca et al, 2022 doi10.1038/s41586-022-05219-6). Therefore, while many organoid types indeed have a more spherical or globular 3D shape, the term organoid also applies to semi-3D or non-globular adherent organoids, such as renal (Czerniecki et al 2018, doi.org/10.1016/j.stem.2018.04.022) and gastrointestinal organoids (Kakni et al 2022, doi.org/10.1016/j.tibtech.2022.01.006). Accordingly, the adherent cortical organoids described in the manuscript exhibit self-organization to single radial structures consisting of multiple cell layers in the z-axis, reaching ~200um thickness (therefore remaining within the limits for sufficient nutrient supply), with consistent cytoarchitectural topology and electrophysiological activity, and therefore meet the consensus definition of an organoid.

      (2) The method therefore should be compared to state-of-the-art (well-based or not) 2D cultures, which seems to be somewhat overlooked in the paper, therefore making it hard to assess what the advance is that is presented by this work. 

      It was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods. Compared to state-of-the-art 2D neural network cultures, adherent cortical organoids provide distinct advantages in:

      (1) Higher order self-organized structure formation, including segregation of deeper and upper cortical layers.

      (2) Longevity: adherent cortical organoids can be successfully kept in culture up to 1 year where 2D cultures typically deteriorate after 8-12 weeks.

      (3) Maturity, including the formation of dendritic mushroom spines and robust electrophysiological activity.

      (4) Cell type diversity including a more physiological ratio of inhibitory and excitatory neurons (10% GAD67+/NeuN+ neurons in adherent cortical organoids, vs 1% in 2D neural networks) and the emergence of oligodendrocyte lineage cells.

      On the other hand, limitations of adherent cortical organoids compared to 2D neural network cultures are:

      (1) Culture times for organoids are much longer than for 2D cultures and the method can therefore be more laborious and more expensive.

      (2) Whole cell patch clamping is not easily feasible in the organoids because of the restricting dimensions of the 384well plates.

      (3) Reproducibility is prominently claimed throughout the manuscript. However, it is challenging to assess this claim based on the data presented, which mostly contain single frames of unquantified, high-resolution images. There are almost no systematic quantifications presented. The ones present (Figure S1D, Figure 4) show very large variability. However, the authors show sets of images across wells (Figure S1B, Figure S3) which hint that in some important aspects, the culture seems reproducible and robust. 

      We made considerable efforts to establish quantitative metrics to assess reproducibility. We applied a quantitative scoring system of single radial structures at different time points for multiple batches of all three lines as indicated in Figure S1D. This figure represents a comprehensive dataset in which each dot represents the average of a different batch of organoids containing 10-40 organoids per batch. To emphasize this, we will adapt the graph to better reflect the breadth of the dataset. Additional quantifications are given in Figure S2 for progenitor and layer markers for Line 1 and in Figure S5 for interneurons across all three lines, showing relatively low variability. That being said, we acknowledge the reviewer’s concerns and will modify the text to reduce the emphasis of this point, pending more extensive data addressing reproducibility across a wide range of parameters.

      (4) What is in the middle? All images show markers in cells present around the center. The center however seems to be a dense lump of cells based on DAPI staining. What is the identity of these cells? Do these cells persist throughout the protocol? Do they divide? Until when? Addressing this prominent cell population is currently lacking. 

      A more comprehensive characterization of the cells in the center remains a significant challenge due to the high cell density hindering antibody penetration. However, dye-based staining methods such as DAPI and the LIVE/DEAD panel confirm a predominance of intact nuclei with very minimal cell death. The limited available data suggest that a substantial proportion of the cells in the center are proliferative neural progenitors, indicated by immunolabeling for SOX2 and Ki67. We will add additional figures to support these findings. Furthermore, we are currently optimizing the conditions to perform single cell / nuclear RNA sequencing to further characterize the cellular composition of the organoids.

      (5) This manuscript proposes a new method of 2D neural culture. However, the description and representation of the method are currently insufficient. <br /> (a) The results section would benefit from a clear and concise, but step-by-step overview of the protocol. The current description refers to an earlier paper and appears to skip over some key steps. This section would benefit from being completely rewritten. This is not a replacement for a clear methods section, but a section that allows readers to clearly interpret results presented later.

      We will revise the manuscript to include a more detailed step-by-step overview of the protocol.

      (b) Along the same lines, the graphical abstract should be much more detailed. It should contain the time frames and the media used at the different stages of the protocol, seeding numbers, etc. 

      As suggested, we will also adapt the graphical abstract to include more detail.

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, van der Kroeg et al have developed a method for creating 3D cortical organoids using iPSC-derived neural progenitor cells in 384-well plates, thus scaling down the neural organoids to adherent culture and a smaller format that is amenable to high throughput cultivation. These adherent cortical organoids, measuring 3 x 3 x 0.2 mm, self-organize over eight weeks and include multiple neuronal subtypes, astrocytes, and oligodendrocyte lineage cells.

      Strengths: 

      (1) The organoids can be cultured for up to 10 months, exhibiting mature dendritic spines, axonal myelination, and robust neuronal activity. 

      (2) Unlike free-floating organoids, these do not develop necrotic cores, making them ideal for high-throughput drug discovery, neurotoxicological screening, and brain disorder studies.

      (3) The method addresses the technical challenge of achieving higher-order neural complexity with reduced heterogeneity and the issue of necrosis in larger organoids. The method presents a technical advance in organoid culture.

      (4) The method has been demonstrated with multiple cell lines which is a strength. 

      (5) The manuscript provides high-quality immunostaining for multiple markers. 

      We appreciate the reviewer’s acknowledgement of the strengths of this novel platform as a technical advance in organoid culture that reduces heterogeneity and shows potential for higher throughput experiments.

      Weaknesses: 

      (1) Direct head-to-head comparison with standard organoid culture seems to be missing and may be valuable for benchmarking, ie what can be done with the new method that cannot be done with standard culture and vice versa, ie what are the aspects in which new method could be inferior to the standard.

      In our opinion, it would be extremely difficult to directly compare methods because of substantial differences. Most notably, whole brain organoids grow to large and irregular globular shapes, while adherent cortical organoids have a highly standardized shape confined by the limits of a 384-well. Moreover, it was not our intention to benchmark this model quantitatively against other culture systems. Rather, we have attempted to characterize the opportunities and limitations of this approach, with a qualitative contrast to other culture methods.

      (2) It would be important to further benchmark the throughput, ie what is the success rate in filling and successfully growing the organoids in the entire 384 well plate? 

      Figure S1D shows the success rate of organoid formation and stability of the organoid structures over time. In addition, we will add the number of wells that were filled per plate.

      (3) For each NPC line an optimal seeding density was estimated based on the proliferation rate of that NPC line and via visual observation after 6 weeks of culture. It would be important to delineate this protocol in more robust terms, in order to enable reproducibility with different cell lines and amongst the labs. 

      Figure S1C provides the relationship between proliferation rate and seeding density, allowing estimation of seeding densities based on the proliferation rate of the NPCs. However, we appreciate the reviewers feedback and will modify the methods to provide more detail.

      Reviewer #3 (Public Review): 

      Summary: 

      Kroeg et al. have introduced a novel method to produce 3D cortical layer formation in hiPSC-derived models, revealing a remarkably consistent topography within compact dimensions. This technique involves seeding frontal cortex-patterned iPSC-derived neural progenitor cells in 384-well plates, triggering the spontaneous assembly of adherent cortical organoids consisting of various neuronal subtypes, astrocytes, and oligodendrocyte lineage cells. 

      Strengths: 

      Compared to existing brain organoid models, these adherent cortical organoids demonstrate enhanced reproducibility and cell viability during prolonged culture, thereby providing versatile opportunities for high-throughput drug discovery, neurotoxicological screening, and the investigation of brain disorder pathophysiology. This is an important and timely issue that needs to be addressed to improve the current brain organoid systems. 

      We thank the reviewer for highlighting the strengths of our novel platform. We appreciate that all three reviewers agree that the adherent cortical organoids presented in this manuscript reliably demonstrate increased reproducibility and longevity. They also commend its potential for higher throughput drug discovery and neurotoxicological/phenotype screening purposes.

      Weaknesses: 

      While the authors have provided significant data supporting this claim, several aspects necessitate further characterization and clarification. Mainly, highlighting the consistency of differentiation across different cell lines and standardizing functional outputs are crucial elements to emphasize the future broad potential of this new organoid system for large-scale pharmacological screening.

      We appreciate the feedback and will add more detail on consistency and standardization of functional outputs.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      To the Senior Editor and the Reviewing Editor:

      We sincerely appreciate the valuable comments provided by the reviewers, the reviewing editor, and the senior editor. Based on our last response and revision, we are confused by the two limitations noted in the eLife assessment. 

      (1) benchmarking against comparable methods is limited.

      In our last revision, we added the comparison experiments with TNDM, as the reviewers requested. Additionally, it is crucial to emphasize that our evaluation of decoding capabilities of behaviorally relevant signals has been benchmarked against the performance of the ANN on raw signals, which, as Reviewer #1 previously noted, nearly represents the upper limit of performance. Consequently, we believe that our benchmarking methods are sufficiently strong.

      (2) some observations may be a byproduct of their method, and may not constitute new scientific observations.

      We believe that our experimental results are sufficient to demonstrate that our conclusions are not byproducts of d-VAE based on three reasons:

      (1) The d-VAE, as a latent variable model, adheres to the population doctrine, which posits that latent variables are responsible for generating the activities of individual neurons. The goal of such models is to maximize the explanation of the raw signals. At the signal level, the only criterion we can rely on is neural reconstruction performance, in which we have achieved unparalleled results. Thus, it is inappropriate to focus on the mixing process during the model's inference stage while overlooking the crucial de-mixing process during the generation stage and dismissing the significance of our neural reconstruction results. For more details, please refer to the first point in our response to Q4 from Reviewer #4.

      (2) The criterion that irrelevant signals should contain minimal information can effectively demonstrate that our conclusions are not by-products of d-VAE. Unfortunately, the reviewers seem to have overlooked this criterion. For more details, please refer to the third point in our response to Q4 from Reviewer #4

      (3) Our synthetic experimental results also substantiate that our conclusions are not byproducts of d-VAE. However, it appears the reviewers did not give these results adequate consideration. For more details, please refer to the fourth point in our response to Q4 from Reviewer #4.

      Furthermore, our work presents not just "a useful method" but a comprehensive framework. Our study proposes, for the first time, a framework for defining, extracting, and validating behaviorally relevant signals. In our current revision, to clearly distinguish between d-VAE and other methods, we have formalized the extraction of behaviorally relevant signals into a mathematical optimization problem. To our knowledge, current methods have not explicitly proposed extracting behaviorally relevant signals, nor have they identified and addressed the key challenges of extracting relevant signals. Similarly, existing research has not yet defined and validated behaviorally relevant signals. For more details, please refer to our response to Q1 from Reviewer #4.

      Based on these considerations, we respectfully request that you reconsider the eLife assessment of our work. We greatly appreciate your time and attention to this matter.

      The main revisions made to the manuscript are as follows:

      (1) We have formalized the extraction of behaviorally relevant signals into a mathematical optimization problem, enabling a clearer distinction between d-VAE and other models.

      (2) We have moderated the assertion about linear readout to highlight its conjectural nature and have broadened the discussion regarding this conclusion. 

      (3) We have elaborated on the model details of d-VAE and have removed the identifiability claim.

      To Reviewer #1

      Q1: “As reviewer 3 also points out, I would, however, caution to interpret this as evidence for linear read-out of the motor system - your model performs a non-linear transformation, and while this is indeed linearly decodable, the motor system would need to do something similar first to achieve the same. In fact to me it seems to show the opposite, that behaviour-related information may not be generally accessible to linear decoders (including to down-stream brain areas).”

      Thank you for your comments. It's important to note that the conclusions we draw are speculative and not definitive. We use terms like "suggest" to reflect this uncertainty. To further emphasize the conjectural nature of our conclusions, we have deliberately moderated our tone.

      The question of whether behaviorally-relevant signals can be accessed by linear decoders or downstream brain regions hinges on the debate over whether the brain employs a strategy of filtering before decoding. If the brain employs such a strategy, the brain can probably access these signals. In our opinion, it is likely that the brain utilizes this strategy.

      Given the existence of behaviorally relevant signals, it is reasonable to assume that the brain has intrinsic mechanisms to differentiate between relevant and irrelevant signals. There is growing evidence suggesting that the brain utilizes various mechanisms, such as attention and specialized filtering, to suppress irrelevant signals and enhance relevant signals [1-3]. Therefore, it is plausible that the brain filters before decoding, thereby effectively accessing behaviorally relevant signals.

      Thank you for your valuable feedback.

      (1) Sreenivasan, Sameet, and Ila Fiete. "Grid cells generate an analog error-correcting code for singularly precise neural computation." Nature neuroscience 14.10 (2011): 1330-1337.

      (2) Schneider, David M., Janani Sundararajan, and Richard Mooney. "A cortical filter that learns to suppress the acoustic consequences of movement." Nature 561.7723 (2018): 391-395.

      (3) Nakajima, Miho, L. Ian Schmitt, and Michael M. Halassa. "Prefrontal cortex regulates sensory filtering through a basal ganglia-to-thalamus pathway." Neuron 103.3 (2019): 445-458.

      Q2: “As in my initial review, I would also caution against making strong claims about identifiability although this work and TNDM seem to show that in practise such methods work quite well. CEBRA, in contrast, offers some theoretical guarantees, but it is not a generative model, so would not allow the type of analysis done in this paper. In your model there is a para,eter \alpha to balance between neural and behaviour reconstruction. This seems very similar to TNDM and has to be optimised - if this is correct, then there is manual intervention required to identify a good model.”

      Thank you for your comments. 

      Considering your concerns about our identifiability claims and the fact that identifiability is not directly relevant to the core of our paper, we have removed content related to identifiability.

      Firstly, our model is based on the pi-VAE, which also has theoretical guarantees. However, it is important to note that all such theoretical guarantees (including pi-VAE and CEBRA) are based on certain assumptions that cannot be validated as the true distribution of latent variables remains unknown.

      Secondly, it is important to clarify that the identifiability of latent variables does not impact the conclusions of this paper, nor does this paper make specific conclusions about the model's latent variables. Identifiability means that distinct latent variables correspond to distinct observations. If multiple latent variables can generate the same observation, it becomes impossible to determine which one is correct given the observation, which leads to the issue of nonidentifiability. Notably, our analysis focuses on the generated signals, not the latent variables themselves, and thus the identifiability of these variables does not affect our findings. 

      Our approach, dedicated to extracting these signals, distinctly differs from methods such as TNDM, which focuses on extracting behaviorally relevant latent dynamics. To clearly set apart d-VAE from other models, we have framed the extraction of behaviorally relevant signals as the following mathematical optimization problem:

      where 𝑥# denotes generated behaviorally-relevant signals, 𝑥 denotes raw noisy signals, 𝐸(⋅,⋅) demotes reconstruction loss, and 𝑅(⋅) denotes regularization loss. It is important to note that while both d-VAE and TNDM employ reconstruction loss, relying solely on this term is insufficient for determining the optimal degree of similarity between the generated and raw noisy signals. The key to accurately extracting behaviorally relevant signals lies in leveraging prior knowledge about these signals to determine the optimal similarity degree, encapsulated by 𝑅(𝒙𝒓).  Other studies have not explicitly proposed extracting behaviorally-relevant signals, nor have they identified and addressed the key challenges involved in extracting relevant signals. Consequently, our approach is distinct from other methods.

      Thank you for your valuable feedback.

      Q3: “Somewhat related, I also found that the now comprehensive comparison with related models shows that the using decoding performance (R2) as a metric for model comparison may be problematic: the R2 values reported in Figure 2 (e.g. the MC_RTT dataset) should be compared to the values reported in the neural latent benchmark, which represent well-tuned models (e.g. AutoLFADS). The numbers (difficult to see, a table with numbers in the appendix would be useful, see: https://eval.ai/web/challenges/challenge-page/1256/leaderboard) seem lower than what can be obtained with models without latent space disentanglement. While this does not necessarily invalidate the conclusions drawn here, it shows that decoding performance can depend on a variety of model choices, and may not be ideal to discriminate between models. I'm also surprised by the low neural R2 for LFADS I assume this is condition-averaged) - LFADS tends to perform very well on this metric.”

      Thank you for your comments. The dataset we utilized is not from the same day as the neural latent benchmark dataset. Notably, there is considerable variation in the length of trials within the RTT paradigm, and the dataset lacks explicit trial information, rendering trial-averaging unsuitable. Furthermore, behaviorally relevant signals are not static averages devoid of variability; even behavioral data exhibits variability. We computed the neural R2 using individual trials rather than condition-averaged responses. 

      Thank you for your valuable feedback.

      Q4: “One statement I still cannot follow is how the prior of the variational distribution is modelled. You say you depart from the usual Gaussian prior, but equation 7 seems to suggest there is a normal prior. Are the parameters of this distribution learned? As I pointed out earlier, I however suspect this may not matter much as you give the prior a very low weight. I also still am not sure how you generate a sample from the variational distribution, do you just draw one for each pass?”

      Thank you for your questions.

      The conditional distribution of prior latent variables 𝑝%(𝒛|𝒚) is a Gaussian distribution, but the distribution of prior latent variables 𝑝(𝒛) is a mixture Gaussian distribution. The distribution of prior latent variables 𝑝(𝒛) is:

      where denotes the empirical distribution of behavioral variables

      𝒚, and 𝑁 denotes the number of samples, 𝒚(𝒊) denotes the 𝒊th sample, δ(⋅) denotes the Dirac delta function, and 𝑝%(𝒛|𝒚) denotes the conditional distribution of prior latent variables given the behavioral variables parameterized by network 𝑚. Based on the above equation, we can see that 𝑝(𝒛) is not a Gaussian distribution, it is a Gaussian mixture model with 𝑁 components, which is theoretically a universal approximator of continuous probability densities.

      Learning this prior is important, as illustrated by our latent variable visualizations, which are not a Gaussian distribution. Upon conducting hypothesis testing for both latent variables and behavioral variables, neither conforms to Gaussian distribution (Lilliefors test and Kolmogorov-Smirnov test). Consequently, imposing a constraint on the latent variables towards N(0,1) is expected to affect performance adversely.

      Regarding sampling, during training process, we draw only one sample from the approximate posterior distribution . It is worth noting that drawing multiple samples or one sample for each pass does not affect the experimental results. After training, we can generate a sample from the prior by providing input behavioral data 𝒚(𝒊) and then generating corresponding samples via and . To extract behaviorally-relevant signals from raw signals, we use and .

      Thank you for your valuable feedback.

      Q5: “(1) I found the figures good and useful, but the text is, in places, not easy to follow. I think the manuscript could be shortened somewhat, and in some places more concise focussed explanations would improve readability.

      (2) I would not call the encoding "complex non-linear" - non-linear is a clear term, but complex can mean many things (e.g. is a quadratic function complex?) ”

      Thank you for your recommendation. We have revised the manuscript for enhanced clarity.  We call the encoding “complex nonlinear” because neurons encode information with varying degrees of nonlinearity, as illustrated in Fig. 3b, f, and Fig. S3b.

      Thank you for your valuable feedback.

      To Reviewer #2

      Q1: “I still remain unconvinced that the core findings of the paper are "unexpected". In the response to my previous Specific Comment #1, they say "We use the term 'unexpected' due to the disparity between our findings and the prior understanding concerning neural encoding and decoding." However, they provide no citations or grounding for why they make those claims. What prior understanding makes it unexpected that encoding is more complex than decoding given the entropy, sparseness, and high dimensionality of neural signals (the "encoding") compared to the smoothness and low dimensionality of typical behavioural signals (the "decoding")?” 

      Thank you for your comments. We believe that both the complexity of neural encoding and the simplicity of neural decoding in motor cortex are unexpected.

      The Complexity of Neural Encoding: As noted in the Introduction, neurons with small R2 values were traditionally considered noise and consequently disregarded, as detailed in references [1-3]. However, after filtering out irrelevant signals, we discovered that these neurons actually contain substantial amounts of behavioral information, previously unrecognized. Similarly, in population-level analyses, neural signals composed of small principal components (PCs) are often dismissed as noise, with analyses typically utilizing only between 6 and 18 PCs [4-10]. Yet, the discarded PC signals nonlinearly encode significant amounts of information, with practically useful dimensions found to range between 30 and 40—far exceeding the usual number analyzed. These findings underscore the complexity of neural encoding and are unexpected.

      The Simplicity of Neural Decoding: In the motor cortex, nonlinear decoding of raw signals has been shown to significantly outperform linear decoding, as evidenced in references [11,12]. Interestingly, after separating behaviorally relevant and irrelevant signals, we observed that the linear decoding performance of behaviorally relevant signals is nearly equivalent to that of nonlinear decoding—a phenomenon previously undocumented in the motor cortex. This discovery is also unexpected.

      Thank you for your valuable feedback.

      (1) Georgopoulos, Apostolos P., Andrew B. Schwartz, and Ronald E. Kettner. "Neuronal population coding of movement direction." Science 233.4771 (1986): 1416-1419.

      (2) Hochberg, Leigh R., et al. "Reach and grasp by people with tetraplegia using a neurally controlled robotic arm." Nature 485.7398 (2012): 372-375. 

      (3) Inoue, Yoh, et al. "Decoding arm speed during reaching." Nature communications 9.1 (2018): 5243.

      (4) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.

      (5) Kaufman, Matthew T., et al. "Cortical activity in the null space: permitting preparation without movement." Nature neuroscience 17.3 (2014): 440-448.

      (6) Elsayed, Gamaleldin F., et al. "Reorganization between preparatory and movement population responses in motor cortex." Nature communications 7.1 (2016): 13239.

      (7) Sadtler, Patrick T., et al. "Neural constraints on learning." Nature 512.7515 (2014): 423426.

      (8) Golub, Matthew D., et al. "Learning by neural reassociation." Nature neuroscience 21.4 (2018): 607-616.

      (9) Gallego, Juan A., et al. "Cortical population activity within a preserved neural manifold underlies multiple motor behaviors." Nature communications 9.1 (2018): 4233.

      (10) Gallego, Juan A., et al. "Long-term stability of cortical population dynamics underlying consistent behavior." Nature neuroscience 23.2 (2020): 260-270.

      (11) Glaser, Joshua I., et al. "Machine learning for neural decoding." Eneuro 7.4 (2020).

      (12) Willsey, Matthew S., et al. "Real-time brain-machine interface in non-human primates achieves high-velocity prosthetic finger movements using a shallow feedforward neural network decoder." Nature Communications 13.1 (2022): 6899.

      Q2: “I still take issue with the premise that signals in the brain are "irrelevant" simply because they do not correlate with a fixed temporal lag with a particular behavioural feature handchosen by the experimenter. In the response to my previous review, the authors say "we employ terms like 'behaviorally-relevant' and 'behaviorally-irrelevant' only regarding behavioral variables of interest measured within a given task, such as arm kinematics during a motor control task.". This is just a restatement of their definition, not a response to my concern, and does not address my concern that the method requires a fixed temporal lag and continual decoding/encoding. My example of reward signals remains. There is a huge body of literature dating back to the 70s on the linear relationships between neural and activity and arm kinematics; in a sense, the authors have chosen the "variable of interest" that proves their point. This all ties back to the previous comment: this is mostly expected, not unexpected, when relating apparently-stochastic, discrete action potential events to smoothly varying limb kinematics.”

      Thank you for your comments. 

      Regarding the experimenter's specification of behavioral variables of interest, we followed common practice in existing studies [1, 2]. Regarding the use of fixed temporal lags, we followed the same practice as papers related to the dataset we use, which assume fixed temporal lags [3-5]. Furthermore, many studies in the motor cortex similarly use fixed temporal lags [68].

      Concerning the issue of rewards, in the paper you mentioned [9], the impact of rewards occurs after the reaching phase. It's important to note that in our experiments, we analyze only the reaching phase, without any post-movement phase. 

      If the impact of rewards can be stably reflected in the signals in the reaching phase of the subsequent trial, and if the reward-induced signals do not interfere with decoding—since these signals are harmless for decoding and beneficial for reconstruction—our model is likely to capture these signals. If the signals induced by rewards during the reaching phase are randomly unstable, our model will likely be unable to capture them.

      If the goal is to extract post-movement neural activity from both rewarded and unrewarded trials, and if the neural patterns differ between these conditions, one could replace the d-VAE's regression loss, used for continuous kinematics decoding, with a classification loss tailored to distinguish between rewarded and unrewarded conditions.

      To clarify the definition, we have revised it in the manuscript. Specifically, before a specific definition, we briefly introduce the relevant signals and irrelevant signals. Behaviorally irrelevant signals refer to those not directly associated with the behavioral variables of interest and may include noise or signals from variables of no interest. In contrast, behaviorally relevant signals refer to those directly related to the behavioral variables of interest. For instance, rewards in the post-movement phase are not directly related to behavioral variables (kinematics) in the reaching movement phase.

      It is important to note that our definition of behaviorally relevant signals not only includes decoding capabilities but also specific requirement at the signal level, based on two key requirements:

      (1) they should closely resemble raw signals to preserve the underlying neuronal properties without becoming so similar that they include irrelevant signals. (encoding requirement), and  (2) they should contain behavioral information as much as possible (decoding requirement). Signals that meet both requirements are considered effective behaviorally relevant signals. In our study, we assume raw signals are additively composed of behaviorally-relevant and irrelevant signals. We define irrelevant signals as those remaining after subtracting relevant signals from raw signals. Therefore, we believe our definition is clearly articulated. 

      Thank you for your valuable feedback.

      (1) Sani, Omid G., et al. "Modeling behaviorally relevant neural dynamics enabled by preferential subspace identification." Nature Neuroscience 24.1 (2021): 140-149.

      (2) Buetfering, Christina, et al. "Behaviorally relevant decision coding in primary somatosensory cortex neurons." Nature neuroscience 25.9 (2022): 1225-1236.

      (3) Wang, Fang, et al. "Quantized attention-gated kernel reinforcement learning for brain– machine interface decoding." IEEE transactions on neural networks and learning systems 28.4 (2015): 873-886.

      (4) Dyer, Eva L., et al. "A cryptography-based approach for movement decoding." Nature biomedical engineering 1.12 (2017): 967-976.

      (5) Ahmadi, Nur, Timothy G. Constandinou, and Christos-Savvas Bouganis. "Robust and accurate decoding of hand kinematics from entire spiking activity using deep learning." Journal of Neural Engineering 18.2 (2021): 026011.

      (6) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.

      (7) Kaufman, Matthew T., et al. "Cortical activity in the null space: permitting preparation without movement." Nature neuroscience 17.3 (2014): 440-448.

      (8) Elsayed, Gamaleldin F., et al. "Reorganization between preparatory and movement population responses in motor cortex." Nature communications 7.1 (2016): 13239.

      (9) Ramkumar, Pavan, et al. "Premotor and motor cortices encode reward." PloS one 11.8 (2016): e0160851.

      Q3: “The authors seem to have missed the spirit of my critique: to say "linear readout is performed in motor cortex" is an over-interpretation of what their model can show.”

      Thank you for your comments. It's important to note that the conclusions we draw are speculative and not definitive. We use terms like "suggest" to reflect this uncertainty. To further emphasize the conjectural nature of our conclusions, we have deliberately moderated our tone.

      The question of whether behaviorally-relevant signals can be accessed by downstream brain regions hinges on the debate over whether the brain employs a strategy of filtering before decoding. If the brain employs such a strategy, the brain can probably access these signals. In our view, it is likely that the brain utilizes this strategy.

      Given the existence of behaviorally relevant signals, it is reasonable to assume that the brain has intrinsic mechanisms to differentiate between relevant and irrelevant signals. There is growing evidence suggesting that the brain utilizes various mechanisms, such as attention and specialized filtering, to suppress irrelevant signals and enhance relevant signals [1-3]. Therefore, it is plausible that the brain filters before decoding, thereby effectively accessing behaviorally relevant signals.

      Regarding the question of whether the brain employs linear readout, given the limitations of current observational methods and our incomplete understanding of brain mechanisms, it is challenging to ascertain whether the brain employs a linear readout. In many cortical areas, linear decoders have proven to be sufficiently accurate. Consequently, numerous studies [4, 5, 6], including the one you referenced [4], directly employ linear decoders to extract information and formulate conclusions based on the decoding results. Contrary to these approaches, our research has compared the performance of linear and nonlinear decoders on behaviorally relevant signals and found their decoding performance is comparable. Considering both the decoding accuracy and model complexity, our results suggest that the motor cortex may utilize linear readout to decode information from relevant signals. Given the current technological limitations, we consider it reasonable to analyze collected data to speculate on the potential workings of the brain, an approach that many studies have also embraced [7-10]. For instance, a study [7] deduces strategies the brain might employ to overcome noise by analyzing the structure of recorded data and decoding outcomes for new stimuli.

      Thank you for your valuable feedback.

      (1) Sreenivasan, Sameet, and Ila Fiete. "Grid cells generate an analog error-correcting code for singularly precise neural computation." Nature neuroscience 14.10 (2011): 1330-1337.

      (2) Schneider, David M., Janani Sundararajan, and Richard Mooney. "A cortical filter that learns to suppress the acoustic consequences of movement." Nature 561.7723 (2018): 391-395.

      (3) Nakajima, Miho, L. Ian Schmitt, and Michael M. Halassa. "Prefrontal cortex regulates sensory filtering through a basal ganglia-to-thalamus pathway." Neuron 103.3 (2019): 445-458.

      (4) Jurewicz, Katarzyna, et al. "Irrational choices via a curvilinear representational geometry for value." bioRxiv (2022): 2022-03.

      (5) Hong, Ha, et al. "Explicit information for category-orthogonal object properties increases along the ventral stream." Nature neuroscience 19.4 (2016): 613-622.

      (6) Chang, Le, and Doris Y. Tsao. "The code for facial identity in the primate brain." Cell 169.6 (2017): 1013-1028.

      (7) Ganmor, Elad, Ronen Segev, and Elad Schneidman. "A thesaurus for a neural population code." Elife 4 (2015): e06134.

      (8) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.

      (9) Gallego, Juan A., et al. "Cortical population activity within a preserved neural manifold underlies multiple motor behaviors." Nature communications 9.1 (2018): 4233.

      (10) Gallego, Juan A., et al. "Long-term stability of cortical population dynamics underlying consistent behavior." Nature neuroscience 23.2 (2020): 260-270.

      Q4: “Agreeing with my critique is not sufficient; please provide the data or simulations that provides the context for the reference in the fano factor. I believe my critique is still valid.”

      Thank you for your comments. As we previously replied, Churchland's research examines the variability of neural signals across different stages, including the preparation and execution phases, as well as before and after the target appears. Our study, however, focuses exclusively on the movement execution phase. Consequently, we are unable to produce comparative displays similar to those in his research. Intuitively, one might expect that the variability of behaviorally relevant signals would be lower; however, since no prior studies have accurately extracted such signals, the specific FF values of behaviorally relevant signals remain unknown. Therefore, presenting these values is meaningful, and can provide a reference for future research. While we cannot compare FF across different stages, we can numerically compare the values to the Poisson count process. An FF of 1 indicates a Poisson firing process, and our experimental data reveals that most neurons have an FF less than 1, indicating that the variance in firing counts is below the mean.  Thank you for your valuable feedback.

      To Reviewer #4

      Q1: “Overall, studying neural computations that are behaviorally relevant or not is an important problem, which several previous studies have explored (for example PSID in (Sani et al. 2021), TNDM in (Hurwitz et al. 2021), TAME-GP in (Balzani et al. 2023), pi-VAE in (Zhou and Wei 2020), and dPCA in (Kobak et al. 2016), etc). However, this manuscript does not properly put their work in the context of such prior works. For example, the abstract states "One solution is to accurately separate behaviorally-relevant and irrelevant signals, but this approach remains elusive", which is not the case given that these prior works have done that. The same is true for various claims in the main text, for example "Furthermore, we found that the dimensionality of primary subspace of raw signals (26, 64, and 45 for datasets A, B, and C) is significantly higher than that of behaviorally-relevant signals (7, 13, and 9), indicating that using raw signals to estimate the neural dimensionality of behaviors leads to an overestimation" (line 321). This finding was presented in (Sani et al. 2021) and (Hurwitz et al. 2021), which is not clarified here. This issue of putting the work in context has been brought up by other reviewers previously but seems to remain largely unaddressed. The introduction is inaccurate also in that it mixes up methods that were designed for separation of behaviorally relevant information with those that are unsupervised and do not aim to do so (e.g., LFADS). The introduction should be significantly revised to explicitly discuss prior models/works that specifically formulated this behavior separation and what these prior studies found, and how this study differs.”  

      Thank you for your comments. Our statement about “One solution is to accurately separate behaviorally-relevant and irrelevant signals, but this approach remains elusive” is accurate. To our best knowledge, there is no prior works to do this work--- separating accurate behaviorally relevant neural signals at both single-neuron and single-trial resolution. The works you mentioned have not explicitly proposed extracting behaviorally relevant signals, nor have they identified and addressed the key challenges of extracting relevant signals, namely determining the optimal degree of similarity between the generated relevant signals and raw signals. Those works focus on the latent neural dynamics, rather than signal level.

      To clearly set apart d-VAE from other models, we have framed the extraction of behaviorally relevant signals as the following mathematical optimization problem:

      where 𝒙𝒓 denotes generated behaviorally-relevant signals, 𝒙 denotes raw noisy signals, 𝐸(⋅,⋅) demotes reconstruction loss, and 𝑅(⋅) denotes regularization loss. It is important to note that while both d-VAE and TNDM employ reconstruction loss, relying solely on this term is insufficient for determining the optimal degree of similarity between the generated and raw noisy signals. The key to accurately extracting behaviorally relevant signals lies in leveraging prior knowledge about these signals to determine the optimal similarity degree, encapsulated by 𝑅(𝒙𝒓). All the works you mentioned did not have the key part 𝑅(𝒙𝒓).

      Regarding the dimensionality estimation, the dimensionality of neural manifolds quantifies the degrees of freedom required to describe population activity without significant information loss.

      There are two differences between our work and PSID and TNDM. 

      First, the dimensions they refer to are fundamentally different from ours. The dimensionality we describe pertains to a linear subspace, where a neural dimension or neural mode or principal component basis, , with N representing the number of neurons. However, the vector length of a neural mode of PSID and our approach differs; PSID requires concatenating multiple time steps T, essentially making , TNDM, on the other hand, involves nonlinear dimensionality reduction, which is different from linear dimensionality reduction.

      Second, we estimate neural dimensionality by explaining the variance of neural signals, whereas PSID and TNDM determine dimensionality through decoding performance saturation. It is important to note that the dimensionality at which decoding performance saturates may not accurately reflect the true dimensionality of neural manifolds, as some dimensions may contain redundant information that does not enhance decoding performance.

      We acknowledge that while LFADS can generate signals that contain some behavioral information, it was not specifically designed to do so. Following your suggestion, we have removed this reference from the Introduction.

      Thank you for your valuable feedback.

      Q2: “Claims about linearity of "motor cortex" readout are not supported by results yet stated even in the abstract. Instead, what the results support is that for decoding behavior from the output of the dVAE model -- that is trained specifically to have a linear behavior readout from its embedding -- a nonlinear readout does not help. This result can be biased by the very construction of the dVAE's loss that encourages a linear readout/decoding from embeddings, and thus does not imply a finding about motor cortex.”

      Thank you for your comments. We respectfully disagree with the notion that the ability of relevant signals to be linearly decoded is due to constraints that allow embedding to be linearly decoded. Embedding involves reorganizing or transforming the structure of original signals, and they can be linearly decoded does not mean the corresponding signals can be decoded linearly.

      Let's clarify this with three intuitive examples:

      Example 1: Image denoising is a well-established field. Whether employing supervised or blind denoising methods [1, 2], both can effectively recover the original image. This denoising process closely resembles the extraction of behaviorally relevant signals from raw signals. Consider if noisy images are not amenable to linear decoding (classification); would removing the noise enable linear decoding? The answer is no. Typically, the noise in images captured under normal conditions is minimal, yet even the clear images remain challenging to decode linearly.

      Example 2: Consider the task of face recognition, where face images are set against various backgrounds, in this context, the pixels representing the face corresponds to relevant signals, while the background pixels are considered irrelevant. Suppose a network is capable of extracting the face pixels and the resulting embedding can be linearly decoded. Can the face pixels themselves be linearly decoded? The answer is no. If linear decoding of face pixels were feasible, the challenging task of face recognition could be easily resolved by merely extracting the face from the background and training a linear classifier.

      Example 3: In the MNIST dataset, the background is uniformly black, and its impact is minimal. However, linear SVM classifiers used directly on the original pixels significantly underperform compared to non-linear SVMs.

      In summary, embedding involves reorganizing the structure of the original signals through a feature transformation function. However, the reconstruction process can recover the structure of the original signals from the embedding. The fact that the structure of the embedding can be linearly decoded does not imply that the structure of the original signals can be linearly decoded in the same way. It is inappropriate to focus on the compression process without equally considering the reconstruction process.

      Thank you for your valuable feedback.

      (1) Mao, Xiao-Jiao, Chunhua Shen, and Yu-Bin Yang. "Image restoration using convolutional auto-encoders with symmetric skip connections." arXiv preprint arXiv:1606.08921 (2016).

      (2) Lehtinen, Jaakko, et al. "Noise2Noise: Learning image restoration without clean data." International Conference on Machine Learning. International Machine Learning Society, 2018.

      Q3: “Related to the above, it is unclear what the manuscript means by readout from motor cortex. A clearer definition of "readout" (a mapping from what to what?) in general is needed. The mapping that the linearity/nonlinearity claims refer to is from the *inferred* behaviorally relevant neural signals, which themselves are inferred nonlinearly using the VAE. This should be explicitly clarified in all claims, i.e., that only the mapping from distilled signals to behavior is linear, not the whole mapping from neural data to behavior. Again, to say the readout from motor cortex is linear is not supported, including in the abstract.” 

      Thank you for your comments. We have revised the manuscript to make it more clearly. Thank you for your valuable feedback.

      Q4: “Claims about individual neurons are also confounded. The d-VAE distilling processing is a population level embedding so the individual distilled neurons are not obtainable on their own without using the population data. This population level approach also raises the possibility that information can leak from one neuron to another during distillation, which is indeed what the authors hope would recover true information about individual neurons that wasn't there in the recording (the pixel denoising example). The authors acknowledge the possibility that information could leak to a neuron that didn't truly have that information and try to rule it out to some extent with some simulations and by comparing the distilled behaviorally relevant signals to the original neural signals. But ultimately, the distilled signals are different enough from the original signals to substantially improve decoding of low information neurons, and one cannot be sure if all of the information in distilled signals from any individual neuron truly belongs to that neuron. It is still quite likely that some of the improved behavior prediction of the distilled version of low-information neurons is due to leakage of behaviorally relevant information from other neurons, not the former's inherent behavioral information. This should be explicitly acknowledged in the manuscript.”

      Thank you for your comments. We value your insights regarding the mixing process. However, we are confident in the robustness of our conclusions. We respectfully disagree with the notion that the small R2 values containing significant information are primarily due to leakage, and we base our disagreement on four key reasons.

      (1) Neural reconstruction performance is a reliable and valid criterion.

      The purpose of latent variable models is to explain neuronal activity as much as possible. Given the fact that the ground truth of behaviorally-relevant signals, the latent variables, and the generative model is unknow, it becomes evident that the only reliable reference at the signal level is the raw signals. A crucial criterion for evaluating the reliability of latent variable models (including latent variables and generated relevant signals) is their capability to effectively explain the raw signals [1]. Consequently, we firmly maintain the belief that if the generated signals closely resemble the raw signals to the greatest extent possible, in accordance with an equivalence principle, we can claim that these obtained signals faithfully retain the inherent properties of single neurons. 

      Reviewer #4 appears to focus on the compression (mixing) process without giving equal consideration to the reconstruction (de-mixing) process. Numerous studies have demonstrated that deep autoencoders can reconstruct the original signal very effectively. For example, in the field of image denoising, autoencoders are capable of accurately restoring the original image [2, 3]. If one persistently focuses on the fact of mixing and ignores the reconstruction (demix) process, even if the only criterion that we can rely on at the signal level is high, one still won't acknowledge it. If this were the case, many problems would become unsolvable. For instance, a fundamental criterion for latent variable models is their ability to explain the original data. If the ground truth of the latent variables remains unknown and the reconstruction criterion is disregarded, how can we validate the effectiveness of the model, the validity of the latent variables, or ensure that findings related to latent variables are not merely by-products of the model? Therefore, we disagree with the aforementioned notion. We believe that as long as the reconstruction performance is satisfactory, the extracted signals have successfully retained the characteristics of individual neurons.

      In our paper, we have shown in various ways that our generated signals sufficiently resemble the raw signals, including visualizing neuronal activity (Fig. 2m, Fig. 3i, and Fig. S5), achieving the highest performance among competitors (Fig. 2d, h, l), and conducting control analyses. Therefore, we believe our results are reliable. 

      (1) Cunningham, J.P. and Yu, B.M., 2014. Dimensionality reduction for large-scale neural recordings. Nature neuroscience, 17(11), pp.1500-1509.

      (2) Mao, Xiao-Jiao, Chunhua Shen, and Yu-Bin Yang. "Image restoration using convolutional auto-encoders with symmetric skip connections." arXiv preprint arXiv:1606.08921 (2016).

      (3) Lehtinen, Jaakko, et al. "Noise2Noise: Learning image restoration without clean data." International Conference on Machine Learning. International Machine Learning Society, 2018.

      (2) There is no reason for d-VAE to add signals that do not exist in the original signals.

      (1) Adding signals that does not exist in the small R2 neurons would decrease the reconstruction performance. This is because if the added signals contain significant information, they will not resemble the irrelevant signals which contain no information, and thus, the generated signals will not resemble the raw signals. The model optimizes towards reducing the reconstruction loss, and this scenario deviates from the model's optimization direction. It is worth mentioning that when the model only has reconstruction loss without the interference of decoding loss, we believe that information leakage does not happen. Because the model can only be optimized in a direction that is similar to the raw signals; adding non-existent signals to the generated signals would increase the reconstruction loss, which is contrary to the objective of optimization. 

      (2) Information carried by these additional signals is redundant for larger R2 neurons, thus they do not introduce new information that can enhance the decoding performance of the neural population, which does not benefit the decoding loss.

      Based on these two points, we believe the model would not perform such counterproductive and harmful operations.

      (3) The criterion that irrelevant signals should contain minimal information can effectively rule out the leakage scenario.

      The criterion that irrelevant signals should contain minimal information is very important, but it seems that reviewer #4 has continuously overlooked their significance. If the model's reconstruction is insufficient, or if additional information is added (which we do not believe will happen), the residuals would decode a large amount of information, and this criterion would exclude selecting such signals. To clarify, if we assume that x, y, and z denote the raw, relevant, and irrelevant signals of smaller R2 neurons, with x=y+z, and the extracted relevant signals become y+m, the irrelevant signals become z-m in this case. Consequently, the irrelevant signals contain a significant amount of information.

      We presented the decoding R2 for irrelevant signals in real datasets under three distillation scenarios: a bias towards reconstruction (alpha=0, an extreme case where the model only has reconstruction loss without decoding loss), a balanced trade-off, and a bias towards decoding (alpha=0.9), as detailed in Table 1. If significant information from small R2 neurons leaks from large R2 neurons, the irrelevant signals should contain a large amount of information. However, our results indicate that the irrelevant signals contain only minimal information, and their performance closely resembles that of the model training solely with reconstruction loss, showing no significant differences (P > 0.05, Wilcoxon rank-sum test). When the model leans towards decoding, some useful information will be left in the residuals, and irrelevant signals will contain a substantial amount of information, as observed in Table 1, alpha=0.9. Therefore, we will not choose these signals for analysis.

      In conclusion, the criterion that irrelevant signals should contain minimal information is a very effective measure to exclude undesirable signals.

      Author response table 1.

      Decoding R2 of irrelevant signals

      (4) Synthetic experiments can effectively rule out the leakage scenario.

      In the absence of ground truth data, synthetic experiments serve as an effective method for validating models and are commonly employed [1-3]. 

      Our experimental results demonstrate that d-VAE can effectively extract neural signals that more closely resemble actual behaviorally relevant signals (Fig. S2g).  If there were information leakage, it would decrease the similarity to the ground truth signals, hence we have ruled out this possibility. Moreover, in synthetic experiments with small R2 neurons (Fig. S10), results also demonstrate that our model could make these neurons more closely resemble ground truth relevant signals and recover their information. 

      In summary, synthetic experiments strongly demonstrate that our model can recover obscured neuronal information, rather than adding signals that do not exist.

      (1) Pnevmatikakis, Eftychios A., et al. "Simultaneous denoising, deconvolution, and demixing of calcium imaging data." Neuron 89.2 (2016): 285-299.

      (2) Schneider, Steffen, Jin Hwa Lee, and Mackenzie Weygandt Mathis. "Learnable latent embeddings for joint behavioural and neural analysis." Nature 617.7960 (2023): 360-368.

      (3) Zhou, Ding, and Xue-Xin Wei. "Learning identifiable and interpretable latent models of high-dimensional neural activity using pi-VAE." Advances in Neural Information Processing Systems 33 (2020): 7234-7247.

      Based on these four points, we are confident in the reliability of our results. If Reviewer #4 considers these points insufficient, we would highly appreciate it if specific concerns regarding any of these aspects could be detailed.

      Thank you for your valuable feedback.

      Q5: “Given the nuances involved in appropriate comparisons across methods and since two of the datasets are public, the authors should provide their complete code (not just the dVAE method code), including the code for data loading, data preprocessing, model fitting and model evaluation for all methods and public datasets. This will alleviate concerns and allow readers to confirm conclusions (e.g., figure 2) for themselves down the line.”

      Thanks for your suggestion.

      Our codes are now available on GitHub at https://github.com/eric0li/d-VAE. Thank you for your valuable feedback.

      Q6: “Related to 1) above, the authors should explore the results if the affine network h(.) (from embedding to behavior) was replaced with a nonlinear ANN. Perhaps linear decoders would no longer be as close to nonlinear decoders. Regardless, the claim of linearity should be revised as described in 1) and 2) above, and all caveats should be discussed.”

      Thank you for your suggestion. We appreciate your feasible proposal that can be empirically tested. Following your suggestion, we have replaced the decoding of the latent variable z to behavior y with a nonlinear neural network, specifically a neural network with a single hidden layer. The modified model is termed d-VAE2. We applied the d-VAE2 to the real data, and selected the optimal alpha through the validation set. As shown in Table 1, results demonstrate that the performance of KF and ANN remains comparable. Therefore, the capacity to linearly decode behaviorally relevant signals does not stem from the linear decoding of embeddings.

      Author response table 2.

      Decoding R2 of behaviorally relevant signals obtained by d-VAE2

      Additionally, it is worth noting that this approach is uncommon and is considered somewhat inappropriate according to the Information Bottleneck theory [1]. According to the Information Bottleneck theory, information is progressively compressed in multilayer neural networks, discarding what is irrelevant to the output and retaining what is relevant. This means that as the number of layers increases, the mutual information between each layer's embedding and the model input gradually decreases, while the mutual information between each layer's embedding and the model output gradually increases. For the decoding part, if the embeddings that is not closest to the output (behaviors) is used, then these embeddings might contain behaviorally irrelevant signals. Using these embeddings to generate behaviorally relevant signals could lead to the inclusion of irrelevant signals in the behaviorally relevant signals.

      To demonstrate the above statement, we conducted experiments on the synthetic data. As shown in Table 2, we present the performance (neural R2 between the generated signals and the ground truth signals) of both models at several alpha values around the optimal alpha of dVAE (alpha=0.9) selected by the validation set. The experimental results show that at the same alpha value, the performance of d-VAE2 is consistently inferior to that of d-VAE, and d-VAE2 requires a higher alpha value to achieve performance comparable to d-VAE, and the best performance of d-VAE2 is inferior to that of d-VAE.

      Author response table 3.

      Neural R2 between generated signals and real behaviorally relevant signals

      Thank you for your valuable feedback.

      (1) Shwartz-Ziv, Ravid, and Naftali Tishby. "Opening the black box of deep neural networks via information." arXiv preprint arXiv:1703.00810 (2017).

      Q7: “The beginning of the section on the "smaller R2 neurons" should clearly define what R2 is being discussed. Based on the response to previous reviewers, this R2 "signifies the proportion of neuronal activity variance explained by the linear encoding model, calculated using raw signals". This should be mentioned and made clear in the main text whenever this R2 is referred to.”

      Thank you for your suggestion. We have made the modifications in the main text. Thank you for your valuable feedback.

      Q8: “Various terms require clear definitions. The authors sometimes use vague terminology (e.g., "useless") without a clear definition. Similarly, discussions regarding dimensionality could benefit from more precise definitions. How is neural dimensionality defined? For example, how is "neural dimensionality of specific behaviors" (line 590) defined? Related to this, I agree with Reviewer 2 that a clear definition of irrelevant should be mentioned that clarifies that relevance is roughly taken as "correlated or predictive with a fixed time lag". The analyses do not explore relevance with arbitrary time lags between neural and behavior data.”

      Thanks for your suggestion. We have removed the “useless” statements and have revised the statement of “the neural dimensionality of specific behaviors” in our revised manuscripts.

      Regarding the use of fixed temporal lags, we followed the same practice as papers related to the dataset we use, which assume fixed temporal lags [1-3]. Furthermore, many studies in the motor cortex similarly use fixed temporal lags [4-6]. To clarify the definition, we have revised the definition in our manuscript. For details, please refer to the response to Q2 of reviewer #2 and our revised manuscript. We believe our definition is clearly articulated.

      Thank you for your valuable feedback.

      (1) Wang, Fang, et al. "Quantized attention-gated kernel reinforcement learning for brain– machine interface decoding." IEEE transactions on neural networks and learning systems 28.4 (2015): 873-886.

      (2) Dyer, Eva L., et al. "A cryptography-based approach for movement decoding." Nature biomedical engineering 1.12 (2017): 967-976.

      (3) Ahmadi, Nur, Timothy G. Constandinou, and Christos-Savvas Bouganis. "Robust and accurate decoding of hand kinematics from entire spiking activity using deep learning." Journal of Neural Engineering 18.2 (2021): 026011.

      (4) Churchland, Mark M., et al. "Neural population dynamics during reaching." Nature 487.7405 (2012): 51-56.

      (5) Kaufman, Matthew T., et al. "Cortical activity in the null space: permitting preparation without movement." Nature neuroscience 17.3 (2014): 440-448.

      (6) Elsayed, Gamaleldin F., et al. "Reorganization between preparatory and movement population responses in motor cortex." Nature communications 7.1 (2016): 13239. 

      Q9: “CEBRA itself doesn't provide a neural reconstruction from its embeddings, but one could obtain one via a regression from extracted CEBRA embeddings to neural data. In addition to decoding results of CEBRA (figure S3), the neural reconstruction of CEBRA should be computed and CEBRA should be added to Figure 2 to see how the behaviorally relevant and irrelevant signals from CEBRA compare to other methods.”

      Thank you for your question. Modifying CEBRA is beyond the scope of our work. As CEBRA is not a generative model, it cannot obtain behaviorally relevant and irrelevant signals, and therefore it lacks the results presented in Fig. 2. To avoid the same confusion encountered by reviewers #3 and #4 among our readers, we have opted to exclude the comparison with CEBRA. It is crucial to note, as previously stated, that our assessment of decoding capabilities has been benchmarked against the performance of the ANN on raw signals, which almost represents the upper limit of performance. Consequently, omitting CEBRA does not affect our conclusions.

      Thank you for your valuable feedback.

      Q10: “Line 923: "The optimal hyperparameter is selected based on the lowest averaged loss of five-fold training data." => why is this explained specifically under CEBRA? Isn't the same criteria used for hyperparameters of other methods? If so, clarify.”

      Thank you for your question. The hyperparameter selection for CEBRA follows the practice of the original CEBRA paper. The hyperparameter selection for generative models is detailed in the Section “The strategy for selecting effective behaviorally-relevant signals”.  Thank you for your valuable feedback.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors): 

      Major points: 

      (1) A central question regarding VGCC differences at Is vs Ib active zones is why is calcium influx higher at Is active zones compared to Ib. Ideally, the authors would have started this study by showing correlations between Cac abundance, presynaptic calcium influx, and Pr at Is vs Ib active zones. If they had, they would likely find that Cac abundance scales with calcium influx and Pr within Is vs Ib, but that calcium influx is over two-fold enhanced at Is over Ib when normalized to the same Cac abundance. This is more than sufficient to explain the Pr differences, so the rest of the study should have focused on revealing why influx is different at Is over Ib despite an apparently similar level of Cac abundance. Then the examination of CaBeta, Stj, etc could have been used to help explain this conundrum. 

      A lesson might be gleaned in how to structure this narrative from the Rebola 2019 study, which the authors cite and discuss at length. Similar to the current study, that paper started with two synapses ("strong" vs "weak") and sought to explain why they were so different in synaptic strength. First, they examined presynaptic calcium influx, and surprisingly found that the strong synapse had reduced calcium influx compared to the weak. Then the rest of the paper sought to explain why synaptic strength (Pr) was higher at the strong synapse despite reduced calcium influx. The authors do not use this logical flow and narrative in the present study, despite the focus being on how Cav2 channels contribute to strong vs weak synapses - and the primary function of Cav2 channels is to pass calcium at active zones to drive vesicle fusion. 

      Although the authors did not show that presynaptic calcium influx is higher at Is vs Ib active zones in the current manuscript, other studies have previously established that calcium influx is two-fold higher at Is active zones vs Ib (as the authors cite). Rather than focusing so much on Pr at Is vs Ib active zones, which as the authors know can be influenced by myriad differences, it seems the more relevant parameter to study is simply to address presynaptic calcium influx at Is vs Ib, which is the primary function of Cac. Put more simply, if Cac levels are the same at Is vs Ib active zones, why is calcium influx at least two-fold higher at Is? 

      It would therefore seem crucial for the authors to determine presynaptic calcium influx levels (ideally at individual AZs) to really understand how Cac intensity levels correlate with calcium influx. The authors instead map Pr at individual AZs, but as the authors know there are many variables that influence whether a SV releases in addition to calcium influx. There are a number of options for this kind of imaging in Drosophila, including genetically encoded calcium indicators targeted to active zones. But since several studies have previously established that influx is higher at Is active zones over Ib, this may not be necessary. That being said, there is a lot of value in quantitatively analyzing Cac/Stj/CaBeta abundance, calcium influx, and Pr together at individual active zones.

      We appreciate the perspective that we could have focused on why Ca2+ influx is 2x greater at type Is active zones, which we agree is an important and interesting question. However, growing evidence indicates that Ca2+ influx alone, like Ca2+ channel abundance, does not reliably predict synaptic strength between inputs. So, here we focused instead on how other differences between synapses influence Pr and contribute to synaptic heterogeneity between and/or among synapses formed by strong and weak inputs. We have changed our title and framing to better reflect this focus. 

      As Reviewer 1 notes, Rebola et al. (2019) found that lower Pr granule synapses exhibit higher Ca2+ influx (and Ca2+ channel abundance). In another example, Aldahabi et al. (2022) demonstrated that even when Ca2+ influx is greater at high-Pr synapses, it does not necessarily explain differences in synaptic strength as raising Ca2+ entry at low-Pr synapses to high-Pr synapse levels was not sufficient to increase synaptic strength to high-Pr input levels. Similar findings have been reported at tonic and phasic synapses of the Crayfish NMJ (Msghina, 1999).

      Several lines of evidence argue that factors beyond Ca2+ influx also play important roles in establishing distinct release properties at the Drosophila NMJ. A recent study using using a botulinum transgene to isolate type Ib and Is synapses for electrophysiological analysis found that increasing external [Ca2+] from physiological levels (1.8 mM) to 3 mM or even 6 mM does not result in a 3-fold increase in EPSCs or quantal content at type Ib synapses despite the prediction that the increase would be even greater given the power dependence of release on between Ca2+ concentration (He et al., 2023). The authors further found that type Ib synapses are more sensitive than type Is synapses to the slow Ca2+ chelator EGTA, indicating looser Ca2+ channel-SV coupling. 

      Consistently, we find that although VGCC levels are similar at the two inputs, their density is greater at type Is active zones (Figs. 1 and 2). Our findings also reveal additional molecular differences that may contribute to the observed differences in neurotransmitter release properties between the two inputs, including lower levels of the active zone protein Brp (Fig 3) and the auxiliary subunit α2δ-3/Stj (Fig. 6) at high Pr type Is inputs. In contrast, levels of each of these proteins positively correlate with synaptic strength among active zones of a single input, whether low- or high-Pr (Figs. 1, 3, 6). Similarly, levels of each of these proteins increase during homeostatic potentiation of neurotransmitter release (Figs. 4 and 7). Thus, we propose that two broad mechanisms contribute to synaptic diversity in the nervous system: (1) spatial organization and relative molecular content establish distinct average basal release probabilities that differ between inputs and (2) among individual synapses of distinct inputs, coordinated modulation of Ca2+ channel and active zone protein abundance independently tunes Pr. These intersecting mechanisms provide a framework for understanding the extensive and dynamic synaptic diversity observed across nervous systems.

      (2) In addition to key points made above, it seems the authors should at least consider (if not experimentally test) what other differences might contribute to the higher calcium influx at Is over Ib:  

      - Distinct splice isoforms of Cac (and/or Stj/Cabeta): The recent RNAseq analysis of gene expression at Is vs Ib motor neurons from Troy Littleton's group may inform this consideration? 

      - Stj reduction at Is: Do channel studies in heterologous systems give any insight into VGCC channel function with and without a2d-3? Do Cav2 channels without a2d pass more calcium? This would then offer an obvious solution to the key conundrum underlying this study. 

      These are excellent questions that we are actively pursuing. While there is no evidence of differentially expressed splice isoforms of Stj or Ca-β in the recent RNA-seq data from Jetti et al., 2023, subtle changes in Cac isoform usage were observed that may contribute to differences in Ca2+ influx. In heterologous systems, α2δ expression generally increases Ca2+ channel membrane insertion and  Ca2+ currents. However, in vivo α2δ’s can also mediate extracellular interactions that may modulate channel function. We address these points in greater detail in the revised discussion.  

      (3) Assess Stj and CaBeta levels at AZs after PhTx: The successful generation of endogenously tagged Stj and CaBeta enables some relatively easy experiments that would be of interest, similar to what the authors present for Cac. Does Brp similarly control Stj and CaBeta at Is vs Ib compared to what they show for Cac? In addition, does homeostatic plasticity similarly change Stj and CaBeta at Is vs Ib compared to what the authors have shown for Cac? i.e., do they both similarly increase in intensity, by the same amount, as Cac? 

      We agree and have included an analysis of α2δ-3/Stj levels following PhTx exposure (Fig. 7A-C). We have also investigated the regulation of Stj during chronic presynaptic homeostatic potentiation (Fig. 7D-F). In both cases, StjV5-N levels significantly increase at type Ib and Is active zones, consistent with our finding that among AZs of either type Ib or Is inputs, Stj levels correlate with Cac abundance and, thus, Pr. Together with our and others’ findings, this suggests that coordinated increases Ca2+ channel, auxiliary subunit,  and active zone protein abundance positively tunes synaptic strength at diverse synaptic subtypes.

      Minor points: 

      (1) Including line numbers would make reviewing/commenting easier. 

      We apologize for this oversight and have added line numbers to the revised manuscript.

      (2) Fig. 2I: It is not apparent what the mean cluster density is between Ib vs Is (as it is in Fig. 2F-H graphs). The mean and error bars should be included in 2I as it is in 2G. Same with Fig. 3C. 

      Thank you for pointing this out. We have added error bars to the paired analysis in 2I as well as in 3C and 1C.

      (3) Fig. 4 - it might make more sense to normalize Brp and Cac intensity as a percentage of baseline (PhTx at Is or Ib) rather than normalizing everything to control Ib. 

      We have revised the graphs as suggested in Figure 4 and throughout.

      (4) Page 5 bottom - REFS missing after Fig. 1E. 

      Thank you for catching this. We have fixed it.

      Reviewer #2 (Recommendations For The Authors): 

      This reader found differentiating between low Pr sites (deep purple) and cac measurements (black) difficult in Fig 1B. You may consider depicting this differently. 

      Thank you for this feedback. We have changed the color scheme to improve readability.

      I found it difficult to discern the difference between experiments Fig 1E and Fig 1J. Why are individual dots distributed differently? 

      The individual data points are the same as in 1E and 1F, but we have removed the individual NMJ dimensionality to combine all Is and Ib data points together along with best fit lines for comparison of their slopes. We have added text to the revised manuscript to clarify this.

      Results section, second paragraph, add references, remove 'REF': We next investigated the correlation between Pr and VGCC levels and found that at type Is inputs, single-AZ Cac intensity positively correlates with Pr (Fig. 1E; REFS). 

      Thank you. We have corrected this error.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Although the manuscript is well organized and written, it could be largely improved and therefore made more plausible and easier to read. See my point-by-point comments listed below:

      (1) The introduction section is a bit overloaded with some unnecessary information. For example, the authors discussed the relationship between neurotransmitters in the prefrontal and striatum and substance use/sustained attention. However, the results are related to neither the neurotransmitters nor the striatum. In addition, there is a contradictory description about neurotransmitters there, Nicotine/THC leads to increased neurotransmitters, and decreased neurotransmitters is related to poor sustained attention. Does that mean that the use of Nicotine/THC could increase sustained attention?

      Thanks for this insightful question. We understand your concern regarding the seemingly contradictory statements about neurotransmitters and sustained attention. Previous studies have shown that acute administration of nicotine can improve sustained attention (Lawrence et al., 2002; Potter and Newhouse, 2008; Valentine and Sofuoglu, 2018; Young et al., 2004). On the other hand, the acute effects of smoking cannabis on sustained attention are mixed and depend on factors such as dosage and individual differences (Crean et al., 2011). For instance, a previous study (Hart et al., 2001) found that performance on a tracking task, which requires sustained attention, was found to improve significantly after smoking cannabis with a high dose of THC, albeit in experienced cannabis users. However, chronic substance use, including nicotine and cannabis, has been associated with impaired sustained attention (Chamberlain et al., 2012; Dougherty et al., 2013).

      To address your concerns and improve clarity and succinctness of the Introduction, we have removed the description of neurotransmitters from the Introduction. This revision should make the introduction more concise and focus on the direct relationships pertinent to our study.

      (2) It is a bit hard to follow the story for the readers because the Results section went straight into detail. For example, the authors directly introduced that they used the ICV from the Go trials to index sustained attention without basic knowledge about the task. Why use the ICV of Go trials instead of other trials (i.e., successful stop trials) as an index of sustained attention? I suggest presenting the subjects and task details about the data before the detailed behavioral results. The results section should include enough information to understand the presenting results for the readers, rather than forcing the reader to find the answer in the later Methods section.

      We appreciate your suggestion to provide more context about the task and ICV before diving into the detailed behavioural results.

      We used the ICV derived from the Go trials instead of Success stop trials as an index of sustained attention, based on the nature of the stop-signal task and the specific data it generates. Previous studies have indicated that reaction time (RT) variability is a straightforward measure of sustained attention, with increasing variability thought to reflect poorer ability to sustain attention (Esterman and Rothlein, 2019). RT variability is defined as ICV, calculated as the standard deviation of mean Go RT divided by the mean Go RT from Go trials (O'Halloran et al., 2018). The stop signal task includes both Go trials and stop trials. During Go trials, participants are required to respond as quickly and accurately as possible to a Go signal, allowing for the recording of RT for calculating ICV. In contrast, stop trials are designed to measure inhibitory control, where successful response inhibition results in no RT or response recorded in the output. Therefore, Go trials are specifically used to assess sustained attention, while Stop trials primarily assess inhibitory control (Verbruggen et al., 2019).

      We acknowledge the importance of providing this contextual information within the Results section to enhance reader understanding. We have added this information before presenting the behavioural results on Page 6.

      Results

      (1) Behavioural changes over time

      Reaction time (RT) variability is a straightforward measure of sustained attention, with increasing variability thought to reflect poor sustained attention. RT variability is defined as intra-individual coefficient of variation (ICV), calculated as the standard deviation of mean Go RT divided by the mean Go RT from Go trials in the stop signal task. Lower ICV indicates better sustained attention.

      (3) The same problem for section 2 in the Results. What are the predictive networks? Are the predictive networks the same as the networks constructed based on the correlation with ICV? My intuitive feeling is that they are the circular analyses here. The positive/negative/combined networks are calculated based on the correlation between the edges and ICV. Then the author used the network to predict the ICV again. The manipulation from the raw networks (I think they are based on PPI) to the predictive network, and the calculation of the predicted ICV are all missing. The direct exposure of the results to the readers without enough detailed knowledge made everything hard to digest.

      We thank the Reviewer for the insightful comment. We agree with the need for more clarity regarding the predictive networks and the CPM analysis before presenting results. CPM, a data-driven neuroscience approach, is applied to predict individual behaviour from brain functional connectivity (Rosenberg et al., 2016; Shen et al., 2017). The CPM analysis used the strength of the predictive network to predict the individual difference in traits and behaviours. CPM includes several steps: feature selection, feature summarization, model building, and assessment of prediction significance (see Fig. S1).

      During feature selection, we assessed whether connections between brain areas (i.e., edges) in a task-related functional connectivity matrix (derived from general psychophysiological interaction analysis) were positively or negatively correlated with ICV using a significance threshold of P < 0.01. These positively or negatively correlated connections are regarded as positive or negative network, respectively. The network strength of the positive network (or negative network) was determined in each individual by summing the connection strength of each positively (or negatively) correlated edge. The combined network was determined by subtracting the strength of the negative network from the positive network. Next, CPM built a linear model between the network strength of the predictive network and ICV. This model was initially developed using the training set. The predictive networks were then applied to the test set, where network strength was calculated again, and the linear model was used to predict ICV using k-fold cross-validation. Following your advice, we have updated it in the Results section to include these details on Page 7.

      Results

      (2) Cross-sectional brain connectivity

      This study employed CPM, a data-driven neuroscience approach, to identify three predictive networks— positive, negative, and combined— that predict ICV from brain functional connectivity. CPM typically uses the strength of the predictive networks to predict individual differences in traits and behaviors. The predictive networks were obtained based on connectivity analyses of the whole brain. Specifically, we assessed whether connections between brain areas (i.e., edges) in a task-related functional connectivity matrix derived from generalized psychophysiological interaction analysis were positively or negatively correlated with ICV using a significance threshold of P < 0.01. These positively or negatively correlated connections were regarded as positive or negative network, respectively. The network strength of positive networks (or negative networks) was determined for each individual by summing the connection strength of each positively (or negatively) correlated edge. The combined network was determined by subtracting the strength of the negative network from the positive network. We then built a linear model between network strength and ICV in the training set and applied these predictive networks to yield network strength and a linear model in the test set to calculate predicted ICV using k-fold cross validation.

      (4) The authors showed the positive/negative/combined networks from both Go trials and successful stop trials can predict the ICV. I am wondering how the author could validate the specificity of the prediction of these positive/negative/combined networks. For example, how about the networks from the failed stop trials?

      We appreciate the opportunity to clarify the specificity of the predictive networks identified in our study. Here is a more detailed explanation of our findings and their implications.

      To validate the specificity of the sustained attention network identified from CPM analysis, we calculated correlations between the network strength of positive and negative networks and performances from a neuropsychology battery (CANTAB) at each timepoint separately. CANTAB includes several tasks that measure various cognitive functions, such as sustained attention, inhibitory control, impulsivity, and working memory. We found that all positive and negative networks derived from Go and Successful stop trials significantly correlated with a behavioural assay of sustained attention – the rapid visual information processing (RVP) task – at ages 14 and 19 (all P values < 0.028). Age 23 had no RVP task data in the IMAGEN study. There were sporadic significant correlations between constructs such as delay aversion/impulsivity and negative network strength, for example, but the correlations with the RVP were always significant. This demonstrates that the strength of the sustained attention brain network was specifically and robustly correlated with a typical sustained attention task, rather than other cognitive measures. The results are described in the main text on Page 8 and shown in Supplementary materials (Pages 1 and 3) and Table S12.

      In addition, we conducted a CPM analysis to predict ICV using gPPI under Failed stop trials. Our findings showed that positive, negative, and combined networks derived from Failed stop trials significantly predicted ICV: at age 14 (r = 0.10, P = 0.033; r = 0.19, P < 0.001; and r = 0.17, P < 0.001, respectively), at age 19 (r = 0.21; r = 0.18; and r = 0.21, all P < 0.001, respectively), and at age 23 (r = 0.33, r = 0.35, and r = 0.36, respectively, all P < 0.001). Similar results were obtained using a 5-fold CV and leave-site-out CV.

      Our analysis further showed that task-related functional connectivity derived from Go trials, Successful Stop trials, and Failed Stop trials could predict sustained attention across three timepoints. However, the predictive performances of networks derived from Go trials were higher than those from Successful Stop and Failed Stop trials. This suggests that sustained attention is particularly crucial during Go trials when participants need to respond to the Go signal. In contrast, although Successful Stop and Failed Stop trials also require sustained attention, these tasks primarily involve inhibitory control along with sustained attention.

      Taken together, these findings underscore the specificity of the predictive networks of sustained attention. We have updated these results in the Supplementary Materials (Pages 3-5 and Page 7 ):

      Method

      CPM analysis using Failed stop trials

      We performed another CPM analysis using Failed stop trials using gPPI matrix obtained from the second GLM, described in the main text. The CPM analysis was conducted using 10-fold CV, 5-fold CV and leave-site-out CV.

      Results

      CPM predictive performance under Failed stop trials

      Positive, negative, and combined networks derived from Failed stop trials significantly predicted ICV: at age 14 (r = 0.10, P = 0.033; r = 0.19, P < 0.001; and r = 0.17, P < 0.001, respectively), at age 19 (r = 0.21; r = 0.18; and r = 0.21, all P < 0.001, respectively), and at age 23 (r = 0.33, r = 0.35, and r = 0.36, respectively, all P < 0.001). We obtained similar results using a 5-fold CV and leave-site-out CV (Table S6).

      Discussion

      Specificity of the prediction of predictive networks

      We found that task-related function connectivity derived from Go trials, Successful stop trials, and Failed stop trials successfully predicted sustained attention across three timepoints. However, predictive performances of predictive networks derived from Go trials were higher than those derived from Successful stop trials and Failed stop trials. These results suggest that sustained attention is particularly crucial during Go trials when participants need to respond to the Go signal. In contrast, although Successful Stop and Failed Stop trials also require sustained attention, these tasks primarily involve inhibitory control along with sustained attention.

      (5) The author used PPI to define the connectivity of the network. I am not sure why the author used two GLMs for the PPI analysis separately. In the second GLM, Go trials were treated as an implicit baseline. What does this exactly mean? And the gPPI analysis across the entire brain using the Shen atlas is not clear. Normally, as I understand, the PPI/gPPI is conducted to test the task-modulated connectivity between one seed region and the voxels of the whole rest brain. Did the author perform the PPI for each ROI from Shen atlas? More details about how to use PPI to construct the network are required.

      Thank you for your insightful questions. Here, we’d like to clarify how we applied generalized PPI across the whole brain using the Shen atlas and why we used two separate GLMs for the gPPI analysis.

      Yes, PPI is conducted to test the task-modulated connectivity between one seed region and other brain areas. This method can be both voxel-based and ROI-based. In our study, we performed ROI-based gPPI analysis using Shen atlas with 268 regions. Specifically, we performed the PPI on each seed region of interest (ROI) to estimate the task-related FC between this ROI and the remaining ROI (267 regions) under a specific task condition. By performing this analysis across each ROI in the Shen atlas, we generated a 268 × 268 gPPI matrix for each task condition. The matrices were then transposed and averaged with the original matrices, which yielded symmetrical matrices, which were subsequently used for CPM analysis.

      Regarding the use of two separate GLMs for the gPPI analysis, our study aimed to define the task-related FC under two conditions: Go trials and Successful stop trials. The first GLM including Go trials was built to estimate the gPPI during Go trials. However, due to the high frequency of Go trials in the stop signal task, it is common to regard the Go trials as an implicit baseline, as in previous IMAGEN studies (D'Alberto et al., 2018; Whelan et al., 2012). Therefore, to achieve a more accurate estimation of FC during Successful stop trials, we built a second GLM specifically for these trials. Accordingly, we have updated it in the Method Section in the main text on Page 16.

      Method

      2.5 Generalized psychophysiological interaction (gPPI) analysis

      In this study, we adopted gPPI analysis to generate task-related FC matrices and applied CPM analysis to investigate predictive brain networks from adolescents to young adults. PPI analysis describes task-dependent FC between brain regions, traditionally examining connectivity between a seed region of interest (ROI) and the voxels of the whole rest brain. However, this study conducted a generalized PPI analysis, which is on ROI-to-ROI basis (Di et al., 2021), to yield a gPPI matrix across the whole brain instead of just a single seed region.

      Given the high frequency of Go trials in SST, it is common to treat Go trials as an implicit baseline in previous IMAGEN studies (D'Alberto et al., 2018; Whelan et al., 2012). Hence, we built a separate GLM for Successful stop trials, which included two task regressors (Failed and Successful stop trials) and 36 nuisance regressors.

      (6) Why did the author use PPI to construct the network, rather than the other similar methods, for example, beta series correlation (BSC)?

      Thanks for your question. PPI is an approach used to calculate the functional connectivity (FC) under a specific task (i.e., task-related FC). Although most brain connectomic research has utilized resting-state FC (e.g., beta series correlation), FC during task performance has demonstrated superiority in predicting individual behaviours and traits,  due to its potential to capture more behaviourally relevant information (Dhamala et al., 2022; Greene et al., 2018; Yoo et al., 2018). Specifically, Zhao et al. (2023) suggested that task-related FC outperforms both typical task-based and resting-state FC in predicting individual differences. Therefore, we chose to use task-related FC to predict sustained attention over time. We have updated it in the Introduction on Page 5.

      Introduction

      Although most brain connectomic research has utilized resting-state fMRI data, functional connectivity (FC) during task performance has demonstrated superiority in predicting individual behaviours and traits, due to its potential to capture more behaviourally relevant information (Dhamala et al., 2022; Greene et al., 2018; Yoo et al., 2018). Specifically, Zhao et al. (2023) suggested that task-related FC outperforms both typical task-based and resting-state FC in predicting individual differences. Hence, we applied task-related FC to predict sustained attention over time.

      (7) In the section of 'Correlation analysis between the network strength and substance use', the author just described that 'the correlations between xx and xx are shown in Fig5X', and repeated it three times for three correlation results. What exactly are the results? The author should describe the results in detail. And I am wondering whether there are scatter plots for these correlation analyses?

      We’d like to clarify the results in Fig. 5. Fig. 5 illustrates the significant correlations between behaviour and brain activity associated with sustained attention and Cigarette and cannabis use (Cig+CB) after FDR correction. Panel A shows the significant correlation between behaviour level of sustained attention and Cig+CB. Panels B and C show the correlations between brain activity associated with sustained attention and Cig+CB. While Panel B presents the brain activity derived from Go trials, Panel C presents brain activity derived from Successful stop trials. In response to your suggestion, we have described these results in detail on Page 9. We also have included scatter plots for the significant correlations, which are shown in Fig. 5 in Supplementary materials (Fig. S10).

      Results

      (6) Correlation between behaviour and brain to cannabis and cigarette use

      Figs. 5A-C summarizes the results showing the correlation between ICV/brain activity and Cig+CB per timepoint and across timepoints. Fig. 5A shows correlations between ICV and Cig+CB (Tables S14-15). ICV was correlated with Cig+CB at ages 19 (Rho = 0.13, P < 0.001) and 23 (Rho = 0.17, P < 0.001). ICV at ages 14 (Rho = 0.13, P = 0.007) and 19 (Rho = 0.13, P = 0.0003) were correlated with Cig+CB at age 23. Cig+CB at age 19 was correlated with ICV at age 23 (Rho = 0.13, P = 9.38E-05). Fig. 5B shows correlations between brain activity derived from Go trials and Cig+CB (Tables S18-19). Brain activities of positive and negative networks derived from Go trials were correlated with Cig+CB at age 23 (positive network: Rhop = 0.12, P < 0.001; negative network: Rhon = -0.11, P < 0.001). Brain activity of the negative network derived from Go trials at age 14 was correlated with Cig+CB at age 23 (Rhon = -0.16, P = 0.001). Cig+CB at age 19 was correlated with brain activity of the positive network derived from Go trials at age 23 (Rhop = 0.10, P = 0.002). Fig. 5C shows the correlations between brain activity derived from Successful stop and Cig+CB (Tables S18-19). Brain activities of positive and negative networks derived from Successful stop were correlated with Cig+CB at ages 19 (positive network: Rhop = 0.10, P = 0.001; negative network: Rhon = -0.08, P = 0.013) and 23 (positive network: Rhop = 0.13, P < 0.001; negative network: Rhon = -0.11, P = 0.001).

      (8) Lastly, the labels of (A), (B) ... in the figure captions are unclear. The authors should find a better way to place the labels in the caption and keep them consistent throughout all figures.

      Thank you for this valuable comment. We have revised the figure captions in the main text to ensure the labels (A), (B), etc., are placed more clearly and consistently across all figures.

      Reviewer #2 (Public Review):

      While the study largely achieves its aims, several points merit further clarification:

      (1) Regarding connectome-based predictive modeling, an assumption is that connections associated with sustained attention remain consistent across age groups. However, this assumption might be challenged by observed differences in the sustained attention network profile (i.e., connections and related connection strength) across age groups (Figures 2 G-I, Fig. 3 G_I). It's unclear how such differences might impact the prediction results.

      Thank you for your insightful comment. We’d like to clarify that we did not assume that connections associated with sustained attention remain completely consistent across age groups. Indeed, we expected that connections would change across age groups, due to the developmental changes in brain function and structure from adolescence to adulthood. Our focus was on the consistency of individual differences in sustained attention networks over time, recognising that the actual connections within those networks may change. However, we did show that there is some consistency in the specific connections associated with sustained attention over time. Notably, this consistency markedly increases when comparing ages 19 and 23, when developmental factors are less relevant. We support our reasoning above with the following analyses:

      (1) Supplementary materials (Pages 2 and 5), relevant sections highlighted here for emphasis.

      Method

      Comparison of predictive networks identified at one timepoint versus another

      Steiger’s Z value was employed to compare predictive performances of networks identified at different timepoints. This analysis involved comparing the R values derived from networks defined at distinct ages to predict ICV at the same age. For example, we compared the r values of brain networks defined at age 14 when predicting ICV at 19 (i.e., positive network: r = 0.25, negative network: r = 0.25, combined network: r = 0.28) with those R values of brain networks defined at age 19 itself (i.e., positive network: r = 0.16, negative network: r = 0.14, combined network: r = 0.16) derived from Go trials using Steiger's Z test (age 14 → age 19 vs. age 19 → 19). Similarly, comparisons were made between networks defined at age 14 predicting ICV at age 23 and those at age 23 predicting ICV at age 23 (age 14 → age 23 vs. age 23 → 23), as well as between networks defined at age 19 predicting ICV at age 23 and those at age 23 predicting ICV at age 23 (age 19 -> age 23 vs. age 23 -> age 23). These comparisons were performed separately for Go trials and Successful Stop trials.

      Results

      Comparison of predictive performance at different timepoints

      For positive, negative, and combined networks predicting ICV derived from Go trials at age 19, the R values were higher when using predictive networks defined at 19 than those defined at 14 (Z = 3.79, Z = 3.39, Z = 3.99, all P < 0.00071). Similarly, the R values for positive, negative, and combined networks predicting ICV derived from Go trials at age 23 were higher when using predictive networks defined at age 23 compared to those defined at ages 14 (Z = 6.00, Z = 5.96, Z = 6.67, all P < 3.47e-9) or 19 (Z = 2.80, Z = 2.36, Z = 2.57, all P < 0.005).

      At age 19, the R value for the positive network predicting ICV derived from Successful stop trials was higher when using predictive networks defined at 19 compared to those defined at 14 (Z = 1.54, P = 0.022), while the negative and combined networks did not show a significant difference (Z = 0.85, P = 0.398; Z = 2.29, P = 0.123). At age 23, R values for the positive and combined networks predicting ICV derived from Successful stop trials were higher when using predictive networks defined at 23 compared to those defined at 14 (Z = 3.00, Z = 2.48, all P < 3.47e-9) or 19 (Z = 2.52, Z = 1.99, all P < 0.005). However, the R value for the negative network at age 23 did not significantly differ when using predictive networks defined at 14 (Z = 1.80, P = 0.072) or 19 (Z = 1.48, P = 0.138).

      These results indicate that some specific pairwise connections associated with sustained attention at earlier ages, such as 14 and 19, are still relevant as individuals grow older. However, some connections are not optimal for good sustained attention at older ages. That is, the brain reorganizes its connection patterns to maintain optimal functionality for sustained attention as it matures.

      (2) Consistency of Individual Differences:

      We found individual differences in ICV were significantly correlated between the three timepoints (Fig. 1B). In addition, we calculated the correlations of network strength of predictive networks predicting sustained attention derived from Go trials and Successful trials between each timepoints. We found that the correlations of network strength for predictive networks (derived from Go trials and Successful trials) were also significant (all P < 0.003). We have updated these results in the main text (Pages 7-8) and Supplementary Materials (Table S7).

      (2) Cross-sectional brain connectivity

      In addition, we found that network strength of positive, negative, and combined networks derived from Go trials was significantly correlated between the three timepoints (Table S7, all P < 0.003).

      In addition, we found that network strength of positive, negative, and combined networks derived from Successful stop trials was significantly correlated between the three timepoints (Table S7, all P < 0.001).

      (3) Predictive networks across timepoints: Predictive networks defined at age 14 were successfully applied to predict ICV at ages 19 and 23. Similarly, predictive networks defined at age 19 were successfully applied to predict ICV at age 23 (Fig. 4). These results reflect the robustness of the brain network associated with sustained attention over time.

      (4) Dice coefficient analysis: We calculated the Dice coefficient to quantify the similarity of predictive networks across the three timepoints. Connections in the sustained attention networks were significantly similar from ages 14 to 23 (Table S13), despite relatively few overlapping edges over time (as discussed in Supplementary Materials on Page 6).

      (5) Global brain activation: Based on these findings, we indicate that sustained attention relies on global brain activation (i.e., network strength) rather than specific regions or networks (see also (Zhao et al., 2021)).

      In summary, brain network connections undergo change and are not completely consistent across time. However, individual differences in sustained attention and its network are consistent across time, as we found that 1) the brain reorganizes its connection patterns to maintain optimal functionality for sustained attention as it matures. 2) ICV and network strength of sustained attention network were significantly correlated between each timepoint. 3) Sustained attention networks identified from previous timepoints could predict ICV in the subsequent timepoint. 4) Dice coefficient analysis indicated that the edges in the sustained attention networks were significantly similar from ages 14 to 23. 5) Sustained attention networks function as a global activation, rather than specific regions or networks.

      (2) Another assumption of the connectome-based predictive modeling is that the relationship between sustained attention network and substance use is linear and remains linear over development. Such linear evidence from either the literature or their data would be of help.

      Thanks for your valuable suggestion. We'd like to clarify that while CPM assumes a linear relationship between brain and behaviour (Shen et al., 2017), it does not assume that the relationship between the sustained attention network and substance use remains linear over development.

      Our approach in applying CPM to predict sustained attention across different timepoints was based on previous neuroimaging studies (Rosenberg et al., 2016; Rosenberg et al., 2020), which indicated linear associations between brain connectivity patterns and sustained attention using CPM analysis. These findings support the notion of a linear relationship between brain connectivity and sustained attention. In this study, we performed CPM analysis to identify predictive networks predicting sustained attention, not substance use and used the network strength of these predictive networks to represent sustained attention activity.

      To examine the relationship between substance use and sustained attention, as well as its associated brain activity, we conducted correlation analyses and utilized a latent change score model instead of CPM analysis. This decision was informed by cross-sectional studies (Broyd et al., 2016; Lisdahl and Price, 2012) that consistently reported linear associations between substance use and impairments in sustained attention. Additionally, longitudinal research by (Harakeh et al., 2012) indicated a linear relationship between poorer sustained attention and the initiation and escalation of substance use over time.

      Given these previous findings, we assumed a linear relationship between sustained attention and substance use. Our analyses included calculating correlations between substance use and sustained attention, as well as its associated brain activity at each timepoint and across timepoints (Fig. 5). Furthermore, we employed a three-wave bivariable latent change score model, a longitudinal approach, to assess the relationship between substance use and behavirour and brain activity associated with sustained attention (Figs. 6-7). We have added more information in the Introduction to make it more clear on Page 6.

      Introduction

      Additionally, previous cross-sectional and longitudinal studies (Broyd et al., 2016; Harakeh et al., 2012; Lisdahl and Price, 2012) have shown that there are linear relationships between substance use and sustained attention over time. We therefore employed correlation analyses and a latent change score model to estimate the relationship between substance use and both behaviours and brain activity associated with sustained attention.

      (3) Heterogeneity in results suggests individual variability that is not fully captured by group-level analyses. For instance, Figure 1A shows decreasing ICV (better-sustained attention) with age on the group level, while there are both increasing and decreasing patterns on the individual level via visual inspection. Figure 7 demonstrates another example in which the group with a high level of sustained attention has a lower risk of substance use at a later age compared to that in the group with a low level of sustained attention. However, there are individuals in the high sustained attention group who have substance use scores as high as those in the low sustained attention group. This is important to take into consideration and could be a potential future direction for research.

      Thanks for this valuable comment. We appreciate your observation regarding the individual variability that is not fully captured by group-level analyses to some degree. Fig. 1A shows the results from a linear mixed model, which explains group-level changes over time while accounting for the random effect within subjects. Similarly, Fig. 7 shows the group-level association between substance use and sustained attention. We agree that future research could indeed consider individual variability. For example, participants could be categorized based on their consistent trajectories of ICV or substance use (i.e., keep decreasing/increasing) over multiple timepoints. We agree that incorporating individual-level analyses in the future could provide valuable insights and are grateful for your suggestion, which will inform our future research directions.

      The above-mentioned points might partly explain the significant but low correlations between the observed and predicted ICV as shown in Figure 4. Addressing these limitations would help enhance the study's conclusions and guide future research efforts.

      We have updated the text in the Discussion on Page 13:

      Discussion

      However, there are still some individual variabilities not captured in this study, which could be attributed to the diversity in genetic, environmental, and developmental factors influencing sustained attention and substance use. Future research should aim to explore these variabilities in greater depth to gain better understanding of the relationship between sustained attention and substance use.

      Reviewer #3 (Public Review):

      Weaknesses: It's questionable whether the prediction approach (i.e., CPM), even when combined with longitudinal data, can establish causality. I recommend removing the term 'consequence' in the abstract and replacing it with 'predict'. Additionally, the paper could benefit from enhanced rigor through additional analyses, such as testing various thresholds and conducting lagged effect analyses with covariate regression.

      Thank you for your comment. We have replaced “consequence” by “predict” in the abstract.

      Abstract

      Previous studies were predominantly cross-sectional or under-powered and could not indicate if impairment in sustained attention was a predictor of substance-use or a marker of the inclination to engage in such behaviour.

      Reviewer #3 (Recommendations For The Authors):

      (1) The connectivity analysis predicts both baseline and longitudinal attention measures. However, given the high correlation in attention abilities across the three time-points, it's unclear whether the connectivity predicts shared variations of attention across three time points. It would be insightful to assess if predictions at the 2nd and 3rd-time points remained  significant after controlling for attention abilities at the initial time point.

      Thanks for your comments. We performed the CPM analysis to predict ICV at the 2nd and 3rd timepoint, controlling for ICV at age 14 as a covariate. We found that controlling for ICV at age 14, positive, negative, and combined networks derived from Successful stop trials defined at age 14 still predicted ICV at ages 19 and 23. In addition, positive, negative, and combined networks derived from Successful stop trials defined at age 19 predicted ICV at age 23. In addition, positive, negative, and combined networks derived from Go trials defined at age 19 still predicted ICV at age 23, after controlling for ICV at age 14. However, positive, negative, and combined networks derived from Go trials defined at age 14 had lower predictive performances in predicting ICV at ages 19 and 23, after controlling for ICV at age 14. Notably, controlling for ICV at the initial timepoint did not significantly impact the performances of predictive networks derived from Successful stop trials. Accordingly, we have added this analysis and the results in the Supplementary Materials (Pages 3 and 5).

      Method

      Prediction across timepoints controlling for ICV at age 14

      To examine whether connectivity predictors shared variations of sustained attention across timepoints, we applied predictive models developed at ages 14 and 19 to predict ICV at subsequent timepoints controlling for ICV at age 14. Specifically, we used predictive models (including parameters and selected edges) developed at age 14 to predict ICV at ages 19 and 23 separately. First, we calculated the network strength using the gPPI matrix at ages 19 and 23 based on the selected edges identified from CPM analysis at age 14. We then estimated the predicted ICV at ages 19 and 23 by applying the linear model parameters (slope and intercept) obtained from CPM analysis at age 14 to the network strength. Finally, we evaluated the predictive performance by calculating the partial correlation between the predicted and observed values at ages 19 and 23, controlling for ICV at age 14. Similarly, we applied models developed at age 19 to predict ICV at age 23, also controlling for ICV at age 14. To assess the significance of the predictive performance, we used a permutation test, shuffling the predicted ICV values and calculating partial correlation to general a random distribution over 1,000 iterations.

      Results

      Predictions across timepoints controlling for ICV at age 14

      Positive and combined networks derived from Go trials defined at age 14 predicted ICV at ages 19 (r = 0.10, P = 0.028; r = 0.08, P = 0.047) but negative network did not (r = 0.06, P = 0.119). Positive network derived from Go trials defined at age 14 predicted ICV at age 23 (r = 0.11, P = 0.013) but negative and combined networks did not (r = 0.04, P = 0.187; r = 0.08, P = 0.056).  Positive, negative, and combined networks derived from Go trials defined at age 19 predicted ICV at age 23 (r = 0.22, r = 0.19, and r = 0.22, respectively, all P < 0.001).

      Positive, negative, and combined networks derived from Successful stop trials defined at age 14 predicted ICV at age 19 (r = 0.08, P = 0.036; r = 0.10, P = 0.012; r = 0.11, P = 0.009) and 23 (r = 0.11, P = 0.005; r = 0.13, P = 0.005; r = 0.13, P = 0.017) respectively. Positive, negative, and combined networks derived from Successful stop trials defined at age 19 predicted ICV at age 23 (r = 0.18, r = 0.18, and r = 0.17, respectively, all P < 0.001).

      (2) In the Results section, a significance threshold of p = 0.01 was used for the CPM analysis. It would be beneficial to test the stability of these findings using alternative thresholds such as p = 0.05 or p = 0.005.

      We appreciate this insightful comment. We appreciate the suggestion to test the stability of our findings using alternative significance thresholds. Indeed, we have already conducted CPM analyses using a range of thresholds, including 0.1, 0.05, 0.01, 0.005, 0.001, 0.0005, and 0.0001 (see Table S8 in supplementary Materials). The results were similar across different thresholds. Following prior studies (Feng et al., 2024; Ren et al., 2021; Yoo et al., 2018) which used P < 0.01 for feature selection, we chose to focus on the threshold of P < 0.01 for our main analysis. Following your suggestion, we have highlighted this in the Method section on Pages 17-18.

      Method

      2.6.1 ICV prediction

      The r value with an associated P value for each edge was obtained, and a threshold P = 0.01 (Feng et al., 2024; Ren et al., 2021; Yoo et al., 2018) was set to select edges.

      2.6.2 Three cross-validation schemes

      In addition, we conducted the CPM analysis using a range of thresholds for feature selection and observed similar results across different thresholds (See Supplementary Materials Table S8).

      (3) Could you clarify if you used one sub-sample to extract connectivity related to sustained attention and then used another sub-sample to predict substance use with attention-related connectivity?

      Thank you very much for the question. We used the same sample to extract the brain network strength and estimated the correlation with substance use using both the Spearman correlation and latent change score model across three timepoints. We controlled for covariates including sex, age, and scan site at the same time. Accordingly, we have clarified this in the Method section on Page 20. We note that the CPM analyses were conducted using cross-validation, plus a leave-site-out analysis.

      Method

      2.7.3 Correlation between network strength and substance use

      It is worth noting that all the correlations between substance use and sustained attention were conducted using the same sample across three timepoints.

      (4) Could you clarify whether you have regressed covariates in the lagged effects analysis of part 7?

      Thanks for this question. Yes, we confirmed that we controlled the covariates including age, sex and scan sites in the latent change score model. We have described them more clearly now in the Method section (Page 18).

      Method

      2.7.3 Correlation between network strength and substance use

      Additionally, cross-lagged dynamic coupling (i.e., bidirectionality) was employed to explore individual differences in the relationships between substance use and linear changes in ICV/brain activity, as well as the relationship between ICV/brain activity and linear change in substance use. The model accounted for covariates such as age, sex and scan sites.

      References:

      Broyd, S.J., van Hell, H.H., Beale, C., Yucel, M., Solowij, N., 2016. Acute and Chronic Effects of Cannabinoids on Human Cognition-A Systematic Review. Biol Psychiatry 79, 557-567.

      Chamberlain, S.R., Odlaug, B.L., Schreiber, L.R.N., Grant, J.E., 2012. Association between Tobacco Smoking and Cognitive Functioning in Young Adults. The American Journal on Addictions 21, S14-S19.

      Crean, R.D., Crane, N.A., Mason, B.J., 2011. An evidence based review of acute and long-term effects of cannabis use on executive cognitive functions. J Addict Med 5, 1-8.

      D'Alberto, N., Chaarani, B., Orr, C.A., Spechler, P.A., Albaugh, M.D., Allgaier, N., Wonnell, A., Banaschewski, T., Bokde, A.L.W., Bromberg, U., Buchel, C., Quinlan, E.B., Conrod, P.J., Desrivieres, S., Flor, H., Frohner, J.H., Frouin, V., Gowland, P., Heinz, A., Itterman, B., Martinot, J.L., Paillere Martinot, M.L., Artiges, E., Nees, F., Papadopoulos Orfanos, D., Poustka, L., Robbins, T.W., Smolka, M.N., Walter, H., Whelan, R., Schumann, G., Potter, A.S., Garavan, H., 2018. Individual differences in stop-related activity are inflated by the adaptive algorithm in the stop signal task. Hum Brain Mapp 39, 3263-3276.

      Dhamala, E., Yeo, B.T.T., Holmes, A.J., 2022. Methodological Considerations for Brain-Based Predictive Modelling in Psychiatry. Biological Psychiatry.

      Di, X., Zhang, Z.G., Biswal, B.B., 2021. Understanding psychophysiological interaction and its relations to beta series correlation. Brain Imaging and Behavior 15, 958-973.

      Dougherty, D.M., Mathias, C.W., Dawes, M.A., Furr, R.M., Charles, N.E., Liguori, A., Shannon, E.E., Acheson, A., 2013. Impulsivity, attention, memory, and decision-making among adolescent marijuana users. Psychopharmacology (Berl) 226, 307-319.

      Esterman, M., Rothlein, D., 2019. Models of sustained attention. Curr Opin Psychol 29, 174-180.

      Feng, Q., Ren, Z., Wei, D., Liu, C., Wang, X., Li, X., Tie, B., Tang, S., Qiu, J., 2024. Connectome-based predictive modeling of Internet addiction symptomatology. Soc Cogn Affect Neurosci 19.

      Greene, A.S., Gao, S., Scheinost, D., Constable, R.T., 2018. Task-induced brain state manipulation improves prediction of individual traits. Nature Communications 9, 2807.

      Harakeh, Z., de Sonneville, L., van den Eijnden, R.J., Huizink, A.C., Reijneveld, S.A., Ormel, J., Verhulst, F.C., Monshouwer, K., Vollebergh, W.A., 2012. The association between neurocognitive functioning and smoking in adolescence: the TRAILS study. Neuropsychology 26, 541-550.

      Hart, C.L., van Gorp, W., Haney, M., Foltin, R.W., Fischman, M.W., 2001. =. Neuropsychopharmacology 25, 757-765.

      Lawrence, N.S., Ross, T.J., Stein, E.A., 2002. Cognitive mechanisms of nicotine on visual attention. Neuron 36, 539-548.

      Lisdahl, K.M., Price, J.S., 2012. Increased marijuana use and gender predict poorer cognitive functioning in adolescents and emerging adults. J Int Neuropsychol Soc 18, 678-688.

      O'Halloran, L., Cao, Z.P., Ruddy, K., Jollans, L., Albaugh, M.D., Aleni, A., Potter, A.S., Vahey, N., Banaschewski, T., Hohmann, S., Bokde, A.L.W., Bromberg, U., Buchel, C., Quinlan, E.B., Desrivieres, S., Flor, H., Frouin, V., Gowland, P., Heinz, A., Ittermann, B., Nees, F., Orfanos, D.P., Paus, T., Smolka, M.N., Walter, H., Schumann, G., Garavan, H., Kelly, C., Whelan, R., 2018. Neural circuitry underlying sustained attention in healthy adolescents and in ADHD symptomatology. Neuroimage 169, 395-406.

      Potter, A.S., Newhouse, P.A., 2008. Acute nicotine improves cognitive deficits in young adults with attention-deficit/hyperactivity disorder. Pharmacol Biochem Behav 88, 407-417.

      Ren, Z., Daker, R.J., Shi, L., Sun, J., Beaty, R.E., Wu, X., Chen, Q., Yang, W., Lyons, I.M., Green, A.E., Qiu, J., 2021. Connectome-Based Predictive Modeling of Creativity Anxiety. Neuroimage 225, 117469.

      Rosenberg, M.D., Finn, E.S., Scheinost, D., Papademetris, X., Shen, X., Constable, R.T., Chun, M.M., 2016. A neuromarker of sustained attention from whole-brain functional connectivity. Nat Neurosci 19, 165-171.

      Rosenberg, M.D., Scheinost, D., Greene, A.S., Avery, E.W., Kwon, Y.H., Finn, E.S., Ramani, R., Qiu, M., Constable, R.T., Chun, M.M., 2020. Functional connectivity predicts changes in attention observed across minutes, days, and months. Proc Natl Acad Sci U S A 117, 3797-3807.

      Shen, X., Finn, E.S., Scheinost, D., Rosenberg, M.D., Chun, M.M., Papademetris, X., Constable, R.T., 2017. Using connectome-based predictive modeling to predict individual behavior from brain connectivity. Nat Protoc 12, 506-518.

      Valentine, G., Sofuoglu, M., 2018. Cognitive Effects of Nicotine: Recent Progress. Curr Neuropharmacol 16, 403-414.

      Verbruggen, F., Aron, A.R., Band, G.P.H., Beste, C., Bissett, P.G., Brockett, A.T., Brown, J.W., Chamberlain, S.R., Chambers, C.D., Colonius, H., Colzato, L.S., Corneil, B.D., Coxon, J.P., Dupuis, A., Eagle, D.M., Garavan, H., Greenhouse, I., Heathcote, A., Huster, R.J., Jahfari, S., Kenemans, J.L., Leunissen, I., Li, C.S.R., Logan, G.D., Matzke, D., Morein-Zamir, S., Murthy, A., Pare, M., Poldrack, R.A., Ridderinkhof, K.R., Robbins, T.W., Roesch, M.R., Rubia, K., Schachar, R.J., Schall, J.D., Stock, A.K., Swann, N.C., Thakkar, K.N., van der Molen, M.W., Vermeylen, L., Vink, M., Wessel, J.R., Whelan, R., Zandbelt, B.B., Boehler, C.N., 2019. A consensus guide to capturing the ability to inhibit actions and impulsive behaviors in the stop-signal task. Elife 8.

      Whelan, R., Conrod, P.J., Poline, J.B., Lourdusamy, A., Banaschewski, T., Barker, G.J., Bellgrove, M.A., Buchel, C., Byrne, M., Cummins, T.D., Fauth-Buhler, M., Flor, H., Gallinat, J., Heinz, A., Ittermann, B., Mann, K., Martinot, J.L., Lalor, E.C., Lathrop, M., Loth, E., Nees, F., Paus, T., Rietschel, M., Smolka, M.N., Spanagel, R., Stephens, D.N., Struve, M., Thyreau, B., Vollstaedt-Klein, S., Robbins, T.W., Schumann, G., Garavan, H., Consortium, I., 2012. Adolescent impulsivity phenotypes characterized by distinct brain networks. Nat Neurosci 15, 920-925.

      Yoo, K., Rosenberg, M.D., Hsu, W.T., Zhang, S., Li, C.R., Scheinost, D., Constable, R.T., Chun, M.M., 2018. Connectome-based predictive modeling of attention: Comparing different functional connectivity features and prediction methods across datasets. Neuroimage 167, 11-22.

      Young, J.W., Finlayson, K., Spratt, C., Marston, H.M., Crawford, N., Kelly, J.S., Sharkey, J., 2004. Nicotine improves sustained attention in mice: evidence for involvement of the alpha7 nicotinic acetylcholine receptor. Neuropsychopharmacology 29, 891-900.

      Zhao, W., Makowski, C., Hagler, D.J., Garavan, H.P., Thompson, W.K., Greene, D.J., Jernigan, T.L., Dale, A.M., 2023. Task fMRI paradigms may capture more behaviorally relevant information than resting-state functional connectivity. Neuroimage, 119946.

      Zhao, W., Palmer, C.E., Thompson, W.K., Chaarani, B., Garavan, H.P., Casey, B.J., Jernigan, T.L., Dale, A.M., Fan, C.C., 2021. Individual Differences in Cognitive Performance Are Better Predicted by Global Rather Than Localized BOLD Activity Patterns Across the Cortex. Cereb Cortex 31, 1478-1488.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):

      Summary:

      This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, parahippocampal gyrus, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though I have substantive concerns about how this analysis was performed and as such will not summarize the results. Broadly, the behavioural and univariate findings are consistent with the idea that memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths:

      The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.

      Weaknesses:

      As noted above, the pattern similarity analysis for both item and category-level reinstatement was performed in a way that is not interpretable given concerns about temporal autocorrelation within the scanning run. Below, I focus my review on this analytic issue, though I also outline additional concerns.

      We thank the reviewer for both the positive and critical appraisal of our paper.

      (1) The pattern similarity analyses were not done correctly, rendering the results uninterpretable (assuming my understanding of the authors' approach is correct).

      a. First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within the scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, but I don't believe this is possible unfortunately given the authors' design; I believe the target (presumably reinstated) scene only appears once during scanning, so there is no separate neural pattern during the presentation of this picture that they can use. For these reasons, any evidence for "significant scene-specific reinstatement" and the like is completely uninterpretable and would need to be removed from the paper.

      We thank the reviewer for this important input. We acknowledge that our study design leads to temporal autocorrelation in the BOLD signal when calculating RSA between fixation and scene time windows. We also recognize that we cannot interpret the significance of scene-specific reinstatement compared to zero and have accordingly removed this information. Nevertheless, our primary objective was to investigate changes in scene-specific reinstatement in relation to the different time delays of retrieval. Given that the retrieval procedure is the same over time and presumably similarly influenced by temporal autocorrelations, we argue that our results must be attributed to the relative differences in reinstatement across recent and remote trials. Bearing this in mind, we argue that our results can be interpreted in terms of delay-related changes in reinstatement. This information is discussed in pp. 21, 40 of the manuscript.

      We agree with the reviewer that cross-run comparisons would be extremely interesting. This could be achieved by introducing the same items repeatedly across different runs, which was not possible in our current setup since we were interested in single exposure retrieval and practical time restriction in scanning children. We have  introduced this idea in Limitations and Discussion sections (pp. 40, 44) of the manuscript to inform future studies.

      Finally, thanks to the reviewer’s comment, we identified a bug in the final steps of our RSA calculation. Fischer’s z-transformation was incorrectly applied to r-1 values, resulting in abnormally high values. We apologize for this error. We have revised the scripts and rectified the bug by correctly applying Fischer’s z-transformation to the r similarity values. We also adjusted the methods description figure accordingly (Figure 5, p. 22). This adjustment led to slightly altered reinstatement indices. Nevertheless, the overall pattern of delay-related attenuation in the scene-specific reinstatement index, observed in both children and adults, remains consistent. Similarly, we observed gist-like reinstatement uniquely in children.

      b. From a theoretical standpoint, I believe the way this analysis was performed considering the fixation and the immediately following scene also means that the differences between recent and remote could have to do with either the reactivation (processes happening during the fixation, presumably) or differences in the processing of the stimulus itself (happening during the scene presentation). For example, people might be more engaged with the more novel scenes (recent) and therefore process those scenes more; such a difference would be interpreted in this analysis as having to do with reinstatement, but in fact could be just related to the differential scene processing/recognition, etc.

      Thank you for your insightful comments. We acknowledge the theoretical concerns raised about distinguishing between the effects of reactivation processes occurring during fixation and differential processing of the stimulus itself during scene presentation. Specifically, the notion that engagement levels with recent scenes could result in enhanced processing, which might be misattributed to memory reinstatement mechanisms.

      We argue, however, that during scene presentation, scenes are processed more “memory-wise” rather than “perception-wise”, since both recent and remote memories are well-learned, as we included only correctly recalled memories in the analysis.

      We concur that scene presentations entail perceptual processing; however, such processing would be consistent across all items, given that they were presented with the same repeated learning procedure, rendering them equally familiar to participants. In addition, we would argue that distinct activation patterns elicited during varying delays are more likely attributable to memory-related processing, since participants actively engaged in a memory-based decision-making task during these intervals. We have incorporated this rationale into the discussion section of our manuscript (p. 40).

      With this in mind, we hypothesized that in case of “memory-wise” processing, the neural engagement during the scene time window should be higher for remote compared to recent  items, and this increases with passing time as more control and effort should be exhibited during retrieval due to reorganized and distributed nature of memories. If the scenes are processed more “perception-wise”, we would expect higher neural engagement during the retrieval of recent compared to remote items. Our exploratory analysis (detailed overview in supplementary materials, Figure S3, Table S9) revealed a higher neural activation for remote compared to recent items in medial temporal, prefrontal, occipital and cerebellar brain regions, supporting the notion of “memory-wise” processes during scene time window. However, this exploratory analysis cannot provide a direct solution to the reviewer’s concern as our paradigm per se cannot arbitrate between “memory-wise” and “perception-wise” nature of retrieval. We added the point to the discussion (see p. 40).

      c. For the category-based neural reinstatement:

      (1) This suffers from the same issue of correlations being performed within the run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). With this restriction, it may or may not be possible to perform this analysis, depending upon how the same-category scenes are distributed across runs. However, there are other issues with this analysis, as well.

      (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. The authors do not motivate the reason for this switch. Please provide reasoning as to why fixation-fixation is more appropriate than fixation-scene similarity for category-level reinstatement, particularly given the opposite was used for item-level reinstatement. Even if the analyses were done properly, it would remain hard to compare them given this difference in approach.

      (3) I believe the fixation cross with itself is included in the "within category" score  Is this not a single neural pattern correlated with itself, which will yield maximal similarity (pearson r=1) or minimal dissimilarity (1-pearson r=0)? Including these comparisons in the averages for the within-category score will inflate the difference between the "within-category" and "between-category" comparisons. These (e.g., forest1-forest1) should not be included in the within-category comparisons considered; rather, they should be excluded, so the fixations are always different but sometimes the comparisons are two retrievals of the same scene type (forest1-forest2), and other times different scene types (forest1-field1)

      (4) It is troubling that the results from the category reinstatement metric do not seem to conceptually align with past work; for example, a lot of work has shown category-level reinstatement in adults. Here the authors do not show any category-level reinstatement in adults (yet they do in children), which generally seems extremely unexpected given past work and I would guess has to do with the operationalization of the metric.

      Thank you for this important input regarding category-based reinstatement.

      (1) The distribution of within-category items across runs was approximately similar and balanced. Additionally, within runs, they were presented randomly without close temporal proximity. Based on this arrangement, we believe that the issue of close temporal autocorrelation, as pointed out by the reviewer in the context of scene-specific reinstatement, may not apply to the same extent here. Again, our focus is not on the absolute level of category-based reinstatement, but the relative difference across conditions (recent vs. remote short delay vs. remote long delay) which are equally impacted by the autocorrelations.  

      (2) We apologize for not motivating this analysis further. Whereas the scene-reinstatement index (i.e., fixation to scene correlation) gives us a measure of the pre-activation of a concrete scene (e.g., a yellow forest in autumn), the gist-like reinstatement gives us a measure of the pre-activation of a whole category of scenes (e.g., forests). Critically, our window of interest is the fixation period for both sets of analysis (in the absence of any significant visual input). The scene-specific reinstatement uses the scene window as a neural template against which the fixation period can be compared, while the gist-like reinstatement compares similarity of reactivation pattern for trials from the same category but differ in the exact memory content. The reinstatement of more generic, gist-like memory (e.g., forest) across multiple trials should yield more similar neural activation patterns. Significant gist-like reinstatement would suggest that neural patterns for scenes within the same category are more generic, as indicated by higher similarity among them. On the other hand, a more detailed reinstatement of specific types of forests (e.g., a yellow forest in autumn, green pine trees, a bare-leaved forest in spring, etc.) that differ in various dimensions could result in neural activation patterns that are as dissimilar as those seen in the reinstatement of scenes from entirely different categories. Through this methodology, we could distinguish between more generic, gist-like reinstatement and more specific, detailed reinstatement. This is now clarified in the manuscript, see p. 25.

      (3) We apologize for the confusion caused by the figure and analysis description. In our analysis, we indeed excluded the correlation of the fixation cross with itself. Consequently, the diagonal in the figure should be blank to indicate this. This is now revised in the manuscript (Figure 7B and in Methods).

      (4) We appreciate your concern and recognize that the terminology we used might not align perfectly with the conventional understanding of category-based reinstatement. Typically, category-level neural representations (as discussed in Polyn et al., 2005; Jafarpour et al., 2014; among others) are investigated to identify specific brain areas associated with encoding/perception of scenes or faces. Our aim, however, was to explore the mnemonic reinstatement of highly detailed scenes that were elaborately encoded, with the hypothesis that substantial representational transformations would occur over time and vary with age. This hypothesis is based on the memory literature, including the Fuzzy-Trace Theory, the Contextual Binding Theory, and the Trace Transformation Theory (Brainerd & Reyna, 1998; Yonelinas, 2019; Moscovitch & Gilboa, 2023). Therefore, we renamed 'category-based' reinstatement to 'gist-like' reinstatement, which clarifies our concept and better aligns it with existing literature.

      We anticipated that young adults, having the ability to retain detailed narratives post-encoding, would demonstrate a reinstatement of scenes with distinct details, making these scenes dissimilar from each other (see similar findings in Sommer et al., 2021). In contrast, given the anticipated lesser strategic elaboration during learning in children, we hypothesized that they would demonstrate a shallower, more gist-like reinstatement (for instance, children recalling a forest or a field in a general sense without specific details or vivid imagery). This could result in higher category-based similarity, as children might reinstate a more generic forest concept.

      We did not gather additional data on the verbal quality of reinstatement due to the limited scanning time available for children, so these assumptions remain unverified. However, anecdotal observations post-retrieval indicated that adults often reported very vivid scenes associated with clear narrative recall. In contrast, children frequently described more vague memories (e.g., “I know it was a forest”) without specific details. Future studies should include measures to assess the quality of reinstatement, potentially outside the scanning environment.

      (2) I did not see any compelling statistical evidence for the claim of less robust consolidation in children.

      Specifically in terms of the behavioral results of retention of the remote items at 1 vs 14 days, shown in Figure 2B, the authors conclude that memory consolidation is less robust in children (line 246). Yet they do not report statistical evidence for this point, as there was no interaction of this effect with the age group. Children had worse memory than adults overall (in terms of a main effect - i.e. across recent and remote items). If it were consolidation-specific, one would expect that the age differences are bigger for the remote items, and perhaps even most exaggerated for the 14-day-old memories. Yet this does not appear to be the case based on the data the authors report. Therefore, the behavioral differences in retention do not seem to be consolidation specific, and therefore might have more to do with differences in encoding fidelity or retrieval processes more generally across the groups. This should be considered when interpreting the findings.

      Thank you for highlighting this important issue. We acknowledge that our initial description and depiction of our behavioral findings may not have effectively conveyed the main message about memory consolidation. Therefore, we have revised the behavioral results section (see pp. 12-14) to communicate our message more clearly.

      As detailed in the methods section, we reported retention rates only for those items that were correctly (100%) learned on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this strategy allowed us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between groups.

      To illustrate the change in retention rate slopes over time for recently learned items (i.e., immediately 30 minutes after learning), short delay remote, and long delay remote items, relative to the initially correctly learned items more clearly and straightforward, we conducted the following analysis: after observing no differences between sessions in both age groups for recent items on days 1 and 14, we combined the recent items. This approach enabled us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group (F(3,250) = 17.35, p < .001, w2 = .16). The follow up of this interaction revealed significantly less robust memory consolidation across all delay times in children compared to young adults. This information is added in the manuscript in pp. 12-14. We have also updated the figures, incorporating the baseline of 100% correct performance.

      (3) Please clarify which analyses were restricted to correct retrievals only. The univariate analyses states that correct and incorrect trials were modelled separately but does not say which were considered in the main contrast (I assume correct only?). The item specific reinstatement analysis states that only correct trials were considered, but the category-level reinstatement analysis does not say. Please include this detail.

      Thank you for bringing this to our attention. We indeed limited our analysis – including univariate, specific reinstatement, and gist-like analyses – to only correctly remembered items. This decision was made because our goal was to observe delay-related changes in the neural correlates of correct memories, which are potentially stronger. We have incorporated this information into the manuscript.

      (4) To what extent could performance differences be impacting the differences observed across age groups? I think (see prior comment) that the analyses were probably limited to correct trials, which is helpful, but still yields pretty big differences across groups in terms of the amount of data going into each analysis. In general, children showed more attenuated neural effects (e.g., recent/remote or session effects); could this be explained by their weaker memory? Specifically, if only correct trials are considered that means that fewer trials would be going into the analysis for kids, especially for the 14-day remote memories, and perhaps pushing the remove > recent difference for this condition towards 0. The authors might be able to address this analytically; for example, does the remote > recent difference in the univariate data at day 14 correlate with day 14 memory?

      Thank you for pointing this out. Indeed, there was a significant relationship between remote > recent difference in the univariate data and memory performance at day 14 across both age group (see Figure 4C-D). The performance of all participants including children was above chance level for remote trial on day 14. In addition, although number of remote trials was lower in children (18 trials on average) in comparison to adults (22 trials on average), we believe that the number of remote trials was not too low or different across groups for the contrast.

      (5) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. report difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example, in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). This difference from zero or lack thereof seems important to the message - is that correct? If so, can the authors incorporate descriptions of these findings?

      Thank you for this valuable input. When examining recent and remote retrieval separately, indeed both the anterior and posterior regions of the hippocampus exhibited significant activation from zero in adults (all p < .0003FDRcorr) and children (all p < .014FDRcorr, except for recent posterior hippocampus) during all delays. We include this information in the manuscript (see p. 17) and add it to the supplementary materials (Figure S2, Table S7).

      (6) Please provide more details about the choices available for locations in the 3AFC task. (1) Were they different each time, or always the same? If they are always the same, could this be a motor or stimulus/response learning task? (2) Do the options in the 3AFC always come from the same area - in which case the participant is given a clue as to the gist of the location/memory? Or are they sometimes randomly scattered across the image (in which case gist memory, like at a delay, would be sufficient for picking the right option)? Please clarify these points and discuss the logic/impact of these choices on the interpretation of the results.Response: Thank you for pointing this out. During learning and retrieval, we employed the 3AFC (Three-Alternative Forced Choice) task.

      The choices for locations varied across scenes while remained the same across time within individuals. There were 18 different key locations for the objects, distributed across the stimulus set. This means the locations of the objects were quite heterogeneous and differed between objects. The location of the object within the task was presented once during encoding and remained consistent throughout learning. Given the location heterogeneity, we believe our task cannot be reduced to a mere “stimulus/response learning task” but is more accurately described as an object-location associations task.

      Similar to the previous description, the options for the 3AFC task did not originate from the same area, as there were 18 different areas in total. The three choice options were distributed equally: so sometimes the “correct” answer was the left option, sometimes in the middle option, or sometimes the right option. Therefore, we believe that the 3AFC task did not provide clues to the location but required detailed and precise memory of the location. Moreover, the options were not randomly scattered but rather presented close together in the scene, demanding a high level of differentiation between choices.

      Taking all the above into consideration, we assert that precise object-location associative memory is necessary for a correct answer. We have added this information to the manuscript (p. 9).

      (7) Often p values are provided but test statistics, effect sizes, etc. are not - please include this information. It is at times hard to tell whether the authors are reporting main effects, interactions, pairwise comparisons, etc.

      Thank you for bringing this to our attention. We realize that including this information in the Tables may not be the most straightforward approach. Therefore, we have incorporated the test statistics, effect sizes, and related details into the text of the results section for clarity.

      (8) There are not enough methodological details in the main paper to make sense of the results. For example, it is not clear from reading the text that there are new object-location pairs learned each day.

      Thank you for pointing this out. We have added this information to the main manuscript. Additionally, we have emphasized this information in the text referring to Figure 1B.

      (9) The retrieval task does not seem to require retrieval of the scene itself, and as such it would be helpful for the authors to both explain their reasoning for this task to measure reinstatement. Strictly speaking, participants could just remember the location of the object on the screen. Was it verified that children and adults were recalling the actual scene rather than just the location (e.g. via self-report)? It's possible that there may be developmental differences in the tendency to reinstate the scene depending on e.g., their strategy.

      Thank you for highlighting this important point. Indeed, the retrieval task included explicit instructions for participants to recall and visualize the scene associated with the object presented during the fixation time window. Participants were also instructed to recollect the location of the object within the scene. Since the location was contextually bound to the scene and each object had a unique location in each scene, the location of the object was always embedded in the specific scene context. We have added this information to both the Methods and Results sections.

      From the self-reports of the participants (which unfortunately were not systematically collected on all occasions), they indicated that when they could recall the scene and the location due to the memory of stories created during strategic encoding, it aided their memory for the scene and location immensely. We also concur with your observation that children and young adults may differ in their ability to reinstate scenes, depending on the success of their employed recall strategies. This task was conducted with an awareness of potential developmental differences in the ability to form complex contextual memories. Our elaborative learning procedure was designed to minimize these differences. It is important to note though we did not expect children to achieve performance levels fully comparable to adults. There may indeed be developmental differences in reinstatement, such as due to differences in knowledge availability and accessibility (Brod, Werkle-Bergner, & Shing, 2013). We think that these differences may underlie our findings of neural reinstatement. This is now discussed in p. 34-35, 39-43 of the manuscript.

      (10) In general I found the Introduction a bit difficult to follow. Below are a few specific questions I had.

      a. At points findings are presented but the broader picture or take-home point is not expressed directly. For example, lines 112-127, these findings can all be conceptualized within many theories of consolidation, and yet those overarching frameworks are not directly discussed (e.g., that memory traces go from being more reliant on the hippocampus to more on the neocortex). Making these connections directly would likely be helpful for many readers.

      Thank you for bringing this to our attention. We have incorporated a summary of the general frameworks of memory consolidation into the introduction. This addition outlines how our summarized findings, particularly those related to memory consolidation for repeatedly learned information, align with these frameworks (see lines 126-138, 146-150).

      b. Lines 143-153 - The comparison of the Tompary & Davachi (2017) paper with the Oedekoven et al. (2017) reads like the two analyses are directly comparable, but the authors were looking at different things. The Tompary paper is looking at organization (not reinstatement); while the Oedekoven et al. paper is measuring reinstatement (not organization). The authors should clarify how to reconcile these findings.

      Thank you for highlighting this aspect. We have revised how we present the results from Tompary & Davachi (2017). This study examined memory reorganization for memories both with and without overlapping features, and it observed higher neural similarity for memories with overlapping features over time. The authors also explored item-specific reinstatement for recent and remote memories by assessing encoding-retrieval similarity. Since Oedekoven et al. (2017) utilized a similar approach, their results are comparable in terms of reinstatement. We have updated and expanded our manuscript to clarify the parallels between these studies (see lines 157-162).

      c. Line 195-6: I was confused by the prediction of "stable involvement of HC over time" given the work reviewed in the Introduction that HC contribution to memory tends to decrease with consolidation. Please clarify or rephrase.

      Drawing on the Contextual Binding Theory (Yonelinas et al., 2019), as well as the Multiple Trace Theory (Nadel et al., 2000) and supported for instance by evidence from Sekeres et al. (2018), we hypothesized that detailed contextual memories formed through repeated and strategic learning would strengthen the specificity of these memories, resulting in consistent hippocampal involvement for successfully recalled contextualized detailed memories. We have included additional explanatory information in the manuscript to clarify this hypothesis (see lines 217-219).

      d. Lines 200-202: I was a bit confused about this prediction. Firstly, please clarify whether immediate reinstatement has been characterized in this way for kids versus adults. Secondly, don't adults retain gist more over long delays (with specific information getting lost), at least behaviourally? This prediction seems to go against that; please clarify.

      Thank you for raising this important point. Indeed, there are no prior studies that examined memory reinstatement over extended durations in children. The primary existing evidence suggests that neural specificity or patterns of neural representations in children can be robustly observed, while neural selectivity or univariate activation in response to the same stimuli tends to mature later (i.e., Fandakova et al., 2019). Bearing this in mind and recognizing that such neural patterns can be observed in both children and adults, we hypothesized that adults may form stronger detailed contextual memories compared to children. By employing strategies such as creating stories, adults might more easily recall scenes without the need to resort to forming generic or gist-like memories (for example, 'a red fox was near the second left pine tree in a spring green forest'). This assumption aligns with the Fuzzy Trace Theory (Reyna & Brainerd, 1995), which posits that verbatim memories can be created without the extraction of a gist.

      Conversely, we hypothesized that children, due to the ongoing maturation of associative and strategic memory components (as discussed in Shing et al., 2008 and 2010), which are dependent respectively on the hippocampus (HC) and the prefrontal cortex (PFC), would be less adept at creating, retaining, and extracting stories to aid their retrieval process. This could result in them remembering more generic integrated information, like the relationship between a fox and some generic image of a forest. We have added explanatory information to the manuscript to elucidate these points (see lines 225-230).

      Reviewer #1 (Recommendations For The Authors):

      (1) For Figure 3, I would highly recommend changing the aesthetics for the univariate data - at least on my screen they appear to be open boxes with solid vs. dashed lines, and as such look identical to the recent vs. remove distinction in Figure 2B. It also doesn't match the legend for me, which shows the age groups having purple vs. yellow coloring.

      Thank you for this observation. We have adjusted Figure 2 (now Figure 3) (please refer to p. 14) accordingly, now utilizing purple and yellow colors to distinguish between the age groups.

      (2) Lines 329-330, it is not true that "all" indices were significant from zero but this is only apparent if you read the next sentence. Please rephrase to clarify. e.g., "All ... indices with a few exceptions ... were significantly..."?

      Based on the above suggestions and considering our primary focus on time-related changes in scene-specific reinstatement, we will refrain from further interpreting the relative expression of individual scene-specific indices against 0. Consequently, we have removed this information from our analysis.

      (3) It is challenging to interpret some of the significance markers, such as those in Figure 3. For example what effects are being denoted by the asterisks and bars above vs. below the data on panel D? Please clarify and/or note in the legend.

      We have included a note in the legend to clarify the meaning of all significance markers. In addition, we decided to state any significant main and interaction effects in the figure rather that to use significance markers.

      (4) For Figures 2 and 3, only the meaning of error bars is described in the caption. It is not explained in the caption what the boxes, lines, and points denote. Please clarify.

      Thank you for highlighting this. We have added explanations to the figure's annotation for clarity. Please note, that considering other review’s suggestions figure plots may have been adjusted or changed, resulting in adjustment of the explanations in the figure annotation.

      (5) How were recent and remote interspersed relative to one another? The text says that each run had 10 recent and 10 remote pairs, presented in a "pseudo-random order" - not clear what that (pseudo) means in this case. Please clarify.

      Thank you for raising this point. We provide this information in the Methods section “Materials and Procedure”: 'The jitters and the order of presentation for recent and remote items were determined using OptimizeXGUI (Spunt, 2016), following an exponential distribution (Dale, 1999). Ten unique recently learned pairs (from the same testing day) and ten unique remotely learned items (from Day 0) were distributed within each run (in total three runs) in the order as suggested by the software as the most optimal. There were three runs with unique sets of stimuli each resulting in thirty unique recent and thirty unique remote stimuli overall.'

      (6) Figure 1A, second to last screen on the learning cycles row - what would be presented to participants here, one of these three emojis? What does the sleepy face represent? I see some of these points were mentioned in the methods, but additional clarification in the caption would be helpful.

      Thank you for highlighting this. We have included this information in the figure caption. Specifically, the sleepy face symbol in the figure denotes a 'missed response'.

      (7) Not clear how the jittered fixation time between object presentation and scene test is dealt with in representational similarity analyses.

      Thank you for pointing this out. Beta estimates were obtained from a Least Square Separate (LSS) regression model. Each event was modeled with their respective onset and duration and, as such, one beta value was estimated per event (with the lags between events differing from trial to trial). We have edited the corresponding section (see p. 53).  

      (8) It was a little bit strange to have used anterior vs posterior HPC ROIs separately in univariate analysis but then combined them for multivariate. There are many empirical and theoretical motivations for looking at item-specific and category reinstatement in anterior and posterior HPC separately, so I was surprised not to see this. Please explain this reasoning.

      Thank you for pointing this out. We agree with the reviewer and included the anterior and posterior HC ROIs into the multivariate analysis. Please see the revised results section (pp. 13-15).

      (9) The term "neural specificity" is introduced (line 164) without explanation; please clarify.

      Thank you for bringing this to our attention. The term ‘neural specificity’ refers to the neural representational distinctiveness of information. In other words, ‘neural specificity,’ as defined by Fandakova et al. (2019), refers to the distinctiveness of neural representations in the regions that process that sensory input. We decided, however to refrain from using this term and instead to use neural representational distinctiveness, which is more self-explaining and was also introduced in the manuscript.

      (10) Age range is specified as 5-7 years initially (line 187) and then 6-7 years (line 188).

      We have corrected the age range in line 188 to '5 to 7 years.'

      Reviewer #2 (Public Reviews):

      Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. Despite these strengths, there are quite a few important design and analytical choices that derail my enthusiasm for the paper. If the authors could address these concerns, this manuscript would provide a solid foundation to better understand memory consolidation in children.

      We thank the reviewer for both the positive and critical appraisal of our paper.

      Reviewer #2 (Recommendations For The Authors):

      (1) My greatest concern is the difference in memory accuracy that emerges as soon as immediate learning, which undermines the interpretation of any consolidation-related differences. This concern is two-fold. The authors utilize an adaptive learning approach in which participants learn to criteria or stop after 4 repetitions. This type of approach leads to children seeing the stimuli more often during learning compared to adults, which on its own could have consequences for consolidation-related neural markers. Specifically, within adults theoretical and empirical work this shows that repeating information can actually lead to more gist-like representations, which is the exact profile the children are showing. While there could be a strength to this approach because it allows for equivocal memory, the decision to stop repetitions before criteria means that memory performance is significantly lower in the children, which again could have consequences to consolidation-related neural markers. First, the authors do not show any of the learning-related data which would be critical to assess the impact of this design choice. Second, there are likely differences in memory strength at the delay, making it extremely difficult to determine if the neural markers reflect development, worse memory strength, or both. This issue is compounded by the use of a 3-AFC paradigm, wherein "correct responses" included in the analysis could contain a significant amount of guessing responses. I think a partial solution to this problem is to analyze the RT data and include them in the analyses or use a drift-diffusion modeling approach to get more precise estimates of memory strength to control for this feature. An alternative is to sub-select participants in each group to have a sample matched on performance (including # of repetitions) and re-run all the analyses in this sub-sample. Without addressing these concerns it is near impossible to interpret the presented data.

      Thank you for highlighting this point.

      Firstly, we believe that our approach, involving strategic and repeated learning coupled with feedback, enhances the formation of detailed contextual memories. The retrieval procedure also emphasized the need for detailed memory for location. These are critical differences in experimental procedure from previous studies, which enhanced the importance of detailed representations and likely reduced the likelihood of forming gist-like memories.

      Indeed, we ceased further learning after the fourth repetition. Extensive piloting, where we initially stopped after the seventh repetition, showed no improvement beyond the fourth repetition. In fact, performance tended to decline due to fatigue. Therefore, we limited the number of repetition cycles to the point where an improvement of performance was still feasible. Even though children exhibited lower final learning performance overall, we believe our procedure facilitated them to reach their maximal performance within the experimental setup.

      To address the reviewer’s concern, we included learning data to illustrate the progression of learning (see Fig. 1C, pp. 9-10 in Results).

      When interpreting the retention rates, it is important to note that we reported retention rates only for items that were correctly learned (100%) on day 0, day 1, and day 14. This approach meant that different participants had varying numbers of items learned correctly. However, this method enabled us to address our primary question: whether memory consolidation, based on all items initially encoded successfully, is comparably robust between the groups. To simultaneously examine the change in retention rate slopes over time for recent (30 minutes after learning), short delay (one night after) remote, and long delay (two weeks after) remote items, we conducted a separate analysis of retention rates for recent items on days 1 and 14. After observing no differences between sessions in both age groups, we combined the data for recent items. This allowed us to investigate how the slope of memory retention for initially correctly learned items (with a baseline of 100%) changes over time. We observed a significant interaction between item type (recent, short delay remote, long delay remote) and group. Analysis of this interaction revealed significantly less robust memory consolidation across all delay times for children compared to young adults. The figures have been adjusted accordingly to incorporate the baseline of 100% correct performance.

      Following your suggestion, we also employed the drift diffusion model approach to characterize memory strength, calculating drift rate, boundary and non-decision time parameters. We added the results to the Supplementary Materials (section S2.1, Figure S1).

      Generally, our findings indicate lower overall drift rate in children when considering all items that had to be learned. We also observed that adults show higher slope of decline in drift rate in short and long delay, which, however, are characterized still by higher memory strength compared to children. Both age groups required similar amount of evidence to make decision, which declined with delay. It may indicate an adaptation of weaker memory. Further, we observed lesser non-decision time in children compared to adults, potentially suggesting less error checking or less thorough processing and memory access through strategy in children.

      Overall, these results indicate weaker memory strength in children as a quantitative measure. It may nevertheless stem from qualitatively different memory representations that children form, as our RSA findings suggest. We believe that our neural effect reflects the effect of interest (i.e., worse memory due to lower memory strength in children). When controlled for, it will take away variance of interest in the neural data. Therefore, we will refrain from including memory strength into the model. However, we will include mean RT as the indicator of general response tendencies.

      Given that the paper is already very complex and long, we opted to add the diffusion model results to the Supplementary Materials (section S2.1, Fig. S1), while discussing the results in the discussion (p. 35).

      (2) More discussion of the behavioral task should be included in the results, in particular the nature of the adaptive learning paradigm including the behavioral results as well as the categorical nature of the memoranda. Without this information, it is difficult for the reader to understand what category-level versus item-level reinstatement reflects.

      Thank you for this valuable input. We have incorporated this information into the results section. Please refer to pp. 9-10, 12, 14, 21, 25-26 for the added details.

      (3) Some of the methods for the reinstatement analysis were unclear to me or warranted further adjustment. I believe the authors compared the scene against all other scenes. I believe it would be more appropriate to only compare this against scenes drawn from the same category as opposed to all scenes. Secondly, from my reading, it seems like the reinstatement was done during the scene presentation, rather than the object presentation in which they would retrieve the scene. I believe the reinstatement results would be much stronger if it was captured during the object presentation rather than the re-presentation of the scene. Or perhaps both sets of analyses should be included.

      We apologize for the confusion regarding the analysis method.

      During the review process we have improved the description of this analysis and hope it is easier to follow now. In short, we used both approaches (within and between categories) to suit different goals (I.e., measuring scene-reinstatement and gist-like reinstatement).

      Both types of reinstatement were assessed during the fixation cross to avoid confounds with the object itself being on the screen. We only used the scene window in one analysis (scene-reinstatement index) as a neural template to track its pre-activation during the fixation. So, as the reviewer suggests, our rationale is that the reinstatement indeed starts taking place at the short object presentation window, but importantly, extends to the fixation window. We added this clarifying information to the results section (see p. 21-27).

      (4) For the univariate results, it was unclear to me when reading the results whether they were focusing on the object presentation portion of the trial or the scene presentation portion of the trial. Again, I think the claims of reinstatement related activity would be stronger if they accounted for the object presentation period.

      Thank you for pointing this out. Indeed, the univariate results were based on the object presentation time window. We added this information to the results section (Fig. 3, pp. 14, 16).

      (5) Further, given the univariate differences shown across age groups, the authors should re-run all analyses for the RSA controlling for mean activation within the ROI.

      Thank you for highlighting this. We re-ran all analysis for the RSA controlling for the mean activation within the ROI. The results remained unchanged. We have added this information to the results section as well as in Table S8 and S11 in the Supplementary Materials for further details.

      (6) The authors should include explicit tests across groups for their brain-behavior analyses if they want to make any developmentally relevant interpretations of the data. Also, It would be helpful to include similar analyses to those using the univariate signals, and not just the RSA results.

      Following reviewer’s suggestion, we included brain-behavior analyses for univariate data as well as RSA data with explicit tests across groups. These can be found in the Results Section pp. 18-20, 28-32. Due to the interdependence of predefined ROIs and to avoid running a high number of correlation tests, we employed the partial least square correlation analysis for this purpose. This approach focuses on multivariate links between specified Regions of Interest (ROIs) and fluctuations in memory performance over short and long delays across different age cohorts. We argue that this multivariate strategy offers a more comprehensive understanding of the relationships between brain metrics across various ROIs and memory performance, given their mutual dependence and connectivity (refer to Genon et al. (2022) for similar discussions).

      (7) There could be dramatic differences in memory processing across 5-7 year olds. I know the sample is a little small for this, but I would like to see regressions done within the middle childhood group in addition to the across-group comparisons.

      We have included information detailing the relationship between memory retention rate and age within the child group (refer to p. 13). In the child group, both recent and short delay remote memory improved with age. However, the retention rate for long-delayed memory did not show a significant improvement with increasing age in children.

      (8) I am concerned that the authors used global-signal as a regressor in their first-level analyses, given that there could be large changes in the amount of univariate activation that occurs across groups. This approach can lead to false positives and negatives that obscure localized differences. The authors should remove this term, and perhaps use the mean sum of the white matter or CSF to achieve the noise regressor they wanted to include.

      We understand the reviewers' concerns. However, we believe that our approach is recommended for the pediatric population. Specifically, Graff et al., 2021, found that global signal regression is a highly efficacious denoising technique in their study of 4 to 8-year-old children. This technique was previously suggested for adults by Ciric et al., 2017, and the benefits in terms of motion and physiological noise removal outweigh the potential costs of removing some signal of interest, as indicated by Behzadi et al., 2007. Additionally, we incorporated the six anatomic component-based noise correction (CompCor) to account for WM and CSF signals, as recommended in the pediatric literature.

      (9) The authors discuss the relationship between hippocampal reactivation and worse memory through the lens of Schapiro et al., but a new paper by Tanriverdi et al came out in JOCN recently that is more similar to the authors' findings.

      Thank you for highlighting the recent paper by Tanriverdi et al. in JOCN, which aligns closely with our findings. We appreciate the suggestion and agree that exploring this alignment could further enrich our discussion on the relationship between hippocampal reactivation and memory retention. We incorporated this work in our revised manuscript .

      Minor Comments

      - I was surprised that the authors did not see any differences in univariate signals for memory retrieval as a function of development, as much of the prior work has shown differences (for example work by Tracy Riggins). I believe this contrast should be highlighted in the discussion.

      - Given the robust differences in sleep patterns across childhood and the role of sleep in systems consolidation framework, I think this feature should be highlighted in either the introduction or discussion.

      - Could the authors report on differences (or lack of differences) in head motion across the groups, and if they are different whether they could include them as a confounding variable.

      I believe we included six motion parameters and their derivatives into the model

      Thank you for your comments.

      First, prior works on univariate signals of memory retrieval focused mostly on remembered vs forgotten contrasts, while in our study we focused on remote vs recent in short and long delay only for correctly remembered items. This can partially explain the results. We highlighted this information in the discussion session.

      Second, we agree with the reviewer that sleep patterns across childhood should be addressed in the analysis. Therefore, we incorporated them in the discussion section.

      Third, indeed head motion were included in the analysis as confounding variables, as adding them is highly recommended for the developmental population (e.g., Graff et al. 2021). As an example, we observed higher framewise displacement in children compared to adults, t = -16(218), p <. 001, as well as in translational y, t = -2.33(288), p = .02.

      Reviewer #3 (Public Reviews):

      Summary:

      This study aimed to understand the neural correlates of memory recall over short (1-day) and long (14-days) intervals in children (5-7 years old) relative to young adults. The results show that children recall less than young adults and that this is accompanied by less activation (relative to young adults) in brain networks associated with memory retrieval.

      Strengths:

      This paper is one of few investigating long-term memory (multiple days) in a developmental population, an important gap in the field. Also, the authors apply a representational similarity analysis to understand how specific memories evolve over time. This analysis shows how the specificity of memories decreases over time in children relative to adults. This is an interesting finding.

      We thank the reviewer for the appraisal of our manuscript.

      Weaknesses:

      Overall, these results are consistent with what we already know: recall is worse in children relative to adults (e.g., Cycowicz et al., 2001) and children activate memory retrieval networks to a lesser extent than adults (Bauer et al, 2017).

      It seems that the reduced activation in memory recall networks is likely associated with less depth of memory encoding in children due to inattentiveness, reduced motivation, and documented differences in memory strategies. In regard to this, there was consideration of IQ, sex, and handedness but these were not included as covariates as they were not significant although I note p<.16 suggests there was some level of association nonetheless. Also, IQ is measured differently for the children and adults so it's not clear these can be directly contrasted. The authors suggest the instructed elaborative encoding strategy is effective for children and adults but the reference in support of this (Craik & Tulving, 1975) does not seem to support this point.

      Thank you for your review, and we appreciate your valuable feedback. Here are our responses and clarifications:

      Regarding the novelty of the results in terms of mentioned existent literature, we believe that in contrast to Cycowicz et al. (2001) and Bauer et al (2017), etc, we assess not only immediate memory after encoding with semantic judgement of abstract associations, but add to these findings investigating consolidation-related changes in complex associative and contextual information in much under investigated sample of 5-to-7-year-old preschoolers. With this we are able to infer also how neural representations of children change over time, providing invaluable insights into knowledge formation in this developmental cohort.

      With this, the observed age differences are not so of primary importance, as time-related changes in mnemonic representations observed in children.

      Regarding the assumption of inattentiveness in children, we want to emphasize that the experimenter was present throughout the learning process, closely supervising the children. We observed prompt responses to every trial in children and noted an increase in accuracy over the encoding-learning cycles, leading us to conclude that the children were indeed attentive to the task. The observed accuracy improvement across learning cycles  indicates increase in remembered information. Furthermore, we took measures to ensure their engagement, including extensive training in both verbal and computerized versions to ensure that they understood and actively created stories to support their learning.

      We collected motivation data after each task execution in children, and the results indicated that they scored high in motivation. Children not only completed the tasks but also expressed their willingness to participate in subsequent appointments, highlighting their active involvement in the study.

      The observed differences in the efficiency of strategy utilization were expected, given developmental differences in the associative and strategic components of memory in children, as noted in prior research (Shing, 2008, 2010).

      We appreciate your point about IQ, sex, and handedness. These variables were indeed included in the behavioral models, and mean brain activation was also included in the brain data models, addressing the potential influence of these factors on our results.

      While it's true that we applied different tests to measure IQ in children and adults, these tests targeted comparable subtests that addressed similar cognitive constructs. As the final IQ values are standardized, we believe it is appropriate to compare them between the two groups.

      Lastly, we agree that the citation Craik & Tulving, 1975 supports the notion of effectiveness of instructed elaborative learning only in adults, but not in children. For this purpose, we added relevant literature for the child cohort (i.e., Pressley, 1982; Pressley et al., 1981; Shing et al., 2008).

      Reviewer #3 (Recommendations For The Authors):

      An additional point for the authors to consider is that the hypotheses were uncertain. The first is that prefrontal, parietal, cerebellar, occipital, and PHG brain regions would have greater activation over time in adults and not children - which is very imprecise as this is basically the whole brain. Moreover, brain imaging data may be in opposition to this prediction: e.g., the hippocampus has a delayed maturational pattern beyond 5-yrs (e.ge., Canada 2019; Uematsu 2012) and some cortical data predicts earlier development in these regions.

      Thank you for your feedback, and we appreciate your insights regarding our hypotheses.

      The selection of our regions of interest (ROIs) was guided by prior literature that has demonstrated the interactive involvement of multiple brain areas in memory retrieval and consolidation processes. Additionally, our recent work utilizing multivariate partial least square correlation analysis (Schommartz, 2022, Developmental Cognitive Neuroscience) has indicated that unique profiles derived from the structural integrity of multiple brain regions are differentially related to short and long-delay memory consolidation.

      Indeed, the literature suggests that the hippocampus may exhibit a more delayed maturational pattern extending into adolescence, as supported by studies such as Canada (2019) and Uematsu (2012), etc. We added this information as well as findings from the literature on cortical development to be more balanced in our review of the literature.

      Given this complexity, we believe it is important to emphasize in our discussion that both the medial temporal lobe, including the hippocampus, and cortical structures, as well as the cerebellum, undergo profound neural maturation. We highlight these nuances in our revised manuscript to provide a more comprehensive perspective on the developmental differences in memory retention over time.

      The writing was challenging to follow - consider as an example on page 9 the sentence that spans 10 lines of text.

      Thank you for bringing this to our attention. We have carefully reviewed the manuscript and have made efforts to streamline the text, ensuring that sentences are not overly long or complex to improve readability and comprehension.

      I found the analysis (and accompanying figures) a bit of a data mine - there are so many results that are hard to digest and in other cases highly redundant one from the other. This may be resolved in part by moving redundant findings to the supplemental. Some were hard to follow - so when there is a line between recent and recent data, that seems confusing to connect data that, I believe, are different sets of items. Later scatterplots (Fig 7) have pale yellow dots that I had a hard time seeing.

      Thank you for bringing up your concerns regarding the analysis and figures in our manuscript. We have carefully considered your feedback and made several improvements to address these issues.

      To alleviate the challenge of digesting numerous results, we have taken steps to enhance clarity and reduce redundancy. Specifically, we have moved some of the redundant findings to the supplementary sections, which should help streamline the main manuscript and make it more reader friendly.

      Regarding the line between 'recent' and 'recent data,' figure were transformed to a clearer version. Furthermore, we have improved the visibility of certain elements, such as the pale-yellow dots in the scatterplots (Fig 1, 2, 4, etc. ), to ensure that readers can better discern the data points.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      […] 

      Weaknesses: 

      The question of the physiological relevance of short bouts of ischemia remains.

      The chemical ischemia protocol induces a duration-dependent ATP depletion in acute slices on a time scale of minutes (Pape and Rose 2023). This is about the same time scale as the peri-infarct depolarisation (Lauritzen et al. 2011) that the protocol attempts to model. Of course, such models do not completely replicate the complex situation in vivo. However, the presented analyses of synapse function cannot be performed in vivo. We discuss this now in the manuscript.

      The precise mechanisms underlying the shift between ischemia-induced long-term potentiation and long-term failure of synaptic responses were not addressed. Could this be cell death?

      Thank you for the comment. Yes, we indeed believe that the persistent failure of synaptic transmission is because of neuronal cell death (i.e., of CA1 pyramidal cells) or at least persistent depolarisation. We did not explicitly state that in the original submission but do so in the revised manuscript. It is supported by the unquantified observation of swelling and/or loss of integrity of CA1 pyramidal cell bodies in parallel to postsynaptic failure. It is also in line with many reports from the literature, of which we now cite two (lines 186-198).

      Sex differences are not addressed or considered.

      We have performed all experiments on male mice, as indicated in Material and Methods. We have indeed not addressed sex differences of the observed effects. We consider this, and many other important factors, to be interesting topics for follow-up studies. This is now discussed (lines 413-424).

      Reviewer #2 (Public Review): 

      […]

      Weaknesses: 

      The weaknesses are minor and only relate to the interpretation of some of the data regarding the presynaptic mechanisms causing the potentiation of release. The authors measured the fiber volley, which reflects the extracellular voltage of the compound action potential of the fiber bundle. The half-duration of the fiber volley was increased, which could be due to the action potential broadening of the individual axons but could also be due to differences in conduction velocity. We are therefore skeptical whether the conclusion of action broadening is justified.

      These are excellent points. We have added an analysis demonstrating that axonal conduction velocity is unlikely to be affected. Nonetheless, the fiber volley is indeed an indirect measure of what happens in individual axons. We have adjusted our interpretation accordingly and now also discuss alternative explanations of our findings (lines 363-379).

      Reviewer #3 (Public Review): 

      […]

      Weaknesses: 

      The data on fiber volley duration should be supported by more direct measurements to prove that chemical ischemia increases presynaptic Ca2+ influx due to a presynaptic broadening of action potentials. Given the influence that positioning of the stimulating and recording electrode can have on the fiber volley properties, I found this data insufficient to support the assumption of a relationship between increased iGluSnFR fluorescence, action potential broadening, and increased presynaptic Ca2+ levels.

      We have added a new analysis showing that the latency of the fiber volley is unaffected and relatively constant, which strengthens our conclusion. But the fiber volley is indeed an indirect measure of action potential firing in individual axons. The suggested experiment, which would require simultaneous recording of Ca2+ and action potentials in single axons in combination with chemical ischemia, is extremely difficult, if possible at all. Instead, we have extended the discussion and include now further alternative mechanistic explanations (lines 363-379).

      The results are obtained in an ex-vivo preparation, it would be interesting to assess if they could be replicated in vivo models of cerebral ischemia. 

      This would certainly be very interesting but also extremely challenging technically. For a detailed analysis of synaptic changes as presented here, the main difficulty will be to stimulate and visualise glutamate release exclusively in an isolated population of synapses while recording postsynaptic responses in a stroke model.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      […]

      Labelling of experimental groups of 2-minute and 5-minute chemical ischemia is more accurate than "metabolic stress" and "with postsynaptic failure". The critical difference between these two conditions is lost with this nomenclature. The reader could be misled to believe that the two groups form a heterogenous population of responses from the same experimental manipulation which is incorrect.

      We had stated in the manuscript that we ‘ … grouped combined iGluSnFR and electrophysiological recordings according to the effect of chemical ischemia on the synaptic response: ‘chemical ischemia with postsynaptic failure’ if the postsynaptic response did not recover to above 50% of the baseline level and ‘chemical ischemia’ when it did (as indicated in Fig. 1H). …’. The recordings were not grouped according to chemical stress duration but according to the effect on the postsynaptic response. We have revised the text explaining this (lines 125-135) and illustrate that now also in Fig. 1H. We hope this is easier to follow now.

      More details on the long-term impact of 5-minute ischemia on cell viability would be enlightening regarding the specific mechanism separating these two conditions. With 2 minutes it would appear that cells remain alive (i.e. intact post-synaptic responses), 5 minutes however, inducing cell death. 

      Yes, our observations, although not quantified, are in line with cell death as CA1 pyramidal cell bodies appeared swollen and/or lost their integrity when chemical ischemia was followed by postsynaptic failure. This is also in line with reports from the literature. We have revised the results section accordingly (lines 186-201).

      In the paragraph titled "glutamate uptake is unaffected after acute chemical ischemia", there are two erroneous citations of Figure S3 that should be Figure S4.

      Thank you. We corrected this mistake.

      The sex of animals is not given. This is essential information. 

      We used male mice as indicated in the initial version of the manuscript (Material and Methods). We have added a statement regarding the role of sex to the final section of the Discussion.

      Reviewer #2 (Recommendations For The Authors):

      We propose addressing the weaknesses mentioned in the public review. As said, the fibre volley is a very indirect measure of action potential broadening. Based on the iGluSnFR data, the authors predict that the potentiation is mediated by depolarization, action potential broadening, and increased presynaptic calcium influx. The latter could be tested experimentally, but this does not seem necessary if the data are interpreted more cautiously. For example, other explanations for the broadened fiber volley could be mentioned, such as a slowing and/or dispersion of the action potential propagation speed. Furthermore, depolarization could cause elevated resting calcium concentrations, which could potentiate release independently of action potential broadening. Finally, classical forms of presynaptic potentiation of the release machinery that occur during homeostatic plasticity or Hebbian plasticity may operate independently of calcium dynamics.

      Thank you for this comment. The discussion of the mechanism was indeed too short. We have added an analysis of the fiber volley delay after stimulation, which was not affected. Presynaptic action potential broadening is, in our opinion, a very likely explanation for our observations but we did not perform direct experiments. Directly recording presynaptic action potentials and Ca2+ transients in the chemical ischemia model over extended periods of time is a major technical challenge and certainly of interest in the future. As suggested, we have expanded the discussion section and now mention various alternative explanations (lines 363-379).

      There are the following minor suggestions:

      Add line numbers.

      We have added line numbers.

      We would suggest providing exact P values instead of asterisks in the figures. 

      We agree that having exact P values in the figure panels can be very helpful. However, in the present figures they are hard to integrate without overcrowding the already complex panels and thereby obscuring other important details. All p-values are included in the figure legends and/or main text.

      Abstract: "We also observed an unexpected hierarchy of vulnerability of the involved mechanisms and cell types." This sentence is hard to understand and cell types were not directly compared (i.e. axons of CA3 and axons of CA1 neurons were not compared).

      We have revised this statement and removed the reference to cell types.

      In Figure 1G there seems to be an increase in the fiber volley. Is this significant? Could this be due to swelling of the slice during chemical ischemia? Or an increase in excitability? Maybe this could be discussed. 

      The effect was analysed in the context of Fig. 2. A significant increase of the fiber volley amplitude was detected in chemical ischemia (Fig. 2H) but also under control conditions (Fig. 2F). We therefore consider this a change that is detectable but not related to chemical ischemia and not a potential explanation for increased glutamate release (lines 157-160). Also, no significant fiber volley increase was detected in chemical ischemia with postsynaptic failure (Fig. 2H) and in the experiments illustrated in Fig. 4E. Our interpretation is that the fiber volley unspecifically increases in some experiments over the time course of the experiment (~ 60 min) but this is unrelated to chemical ischemia.

      In the results: "A fully separate set of experiments..." Please explain better what this means. 

      We have revised the entire section to explain more clearly how recordings were grouped (lines 125135).

      In the results: "...(Syková and Nicholson, 2008) (Figure S3). However, this was not observed for chemical ischemia without postsynaptic failure (Figure S3), in which the increased glutamate transients were observed." This should probably refer to Figure S4. 

      Thank you for spotting this mistake. We corrected it.

      The last sentence in the results "... most likely by increased presynaptic Ca2+ influx, and, at the same time, the postsynaptic response." This is difficult to understand. Does "at the same time" refer to another mechanism or the consequence of more Ca2+? 

      We revised this part of the results section to improve clarity and toned down our conclusions (lines 328-335 and 363-379).

      Reviewer #3 (Recommendations For The Authors): 

      There are a few points that the author needs to clarify: 

      The authors do not discuss the different behaviour of iGlu F0 during chemical ischemia and chemical ischemia with postsynaptic failure shown in Figure 2, panels D and E. In the first case, during the application of the solution to induce ischemia, iGluF0 decreases while in the other case, it strongly increases before falling down. In both cases, the fEPSP slope is decreased. How does the author explain this observation? 

      We attribute the transient increase of extracellular glutamate during prolonged chemical ischemia to the increase of synaptic glutamate release observed previously under such conditions (Hershkowitz et al. 1993; Tanaka et al. 1997) and other mechanisms reviewed by us (Passlick et al. 2021) (e.g., glial glutamate release, transiently reduced glutamate uptake), which we could not detect during shorter chemical ischemia. The initial drop of the fEPSP slope is most likely due to postsynaptic depolarisation, which is followed by a repolarisation if the chemical stress duration is short. We now explain this in more detail in lines 185-200 of the revised manuscript. Although we focussed on the bi-directional effect on longer timescales in this manuscript, this transient phase during chemical ischemia is very interesting for further investigations.

      On page 8, first line, I think that the authors meant Figure S4, not Figure S3 when they mentioned results on ECS diffusivity and ECS fraction. 

      Yes, thank you for spotting this. We corrected the mistake.

      In Supplementary Figure 5 panel B It seems that PPR is significantly reduced upon chemical ischemia (asterisk on columns green) but the authors claimed in the paper at page 10 that "Analysing the paired-pulse ratio (PPR) of postsynaptic response and iGluSnFR transients revealed no consistent changes after chemical ischemia (Figure S5).". Did the authors refer to the data normalized in panel D? In this case, I do not see the need to normalize raw data that have been already shown in a previous panel and that give different statistical results, probably due to the different tests used (paired in panel B and not paired in panel D). 

      We have clarified this point in the supplementary material (Figure S5, legend). There is a relevant difference between the analyses presented in panel B and D. The paired test presented in B analyses the change of the electrophysiological PPR in response to chemical ischemia. The test in D on the electrophysiologically PPR asks if the reduction in B is significantly different from the changes seen under control conditions. Because it is not, we conclude that chemical ischemia has no relevant effect on the electrophysiological PPR and, in combination with the results on the iGluSnFR PPR, also not on short-term plasticity, as tested here.

      References

      Hershkowitz N, Katchman AN, Veregge S. Site of synaptic depression during hypoxia: a patch-clamp analysis. Journal of Neurophysiology 69: 432–441, 1993.

      Lauritzen M, Dreier JP, Fabricius M, Hartings JA, Graf R, Strong AJ. Clinical Relevance of Cortical Spreading Depression in Neurological Disorders: Migraine, Malignant Stroke, Subarachnoid and Intracranial Hemorrhage, and Traumatic Brain Injury. J Cereb Blood Flow Metab 31: 17–35, 2011.

      Pape N, Rose CR. Activation of TRPV4 channels promotes the loss of cellular ATP in organotypic slices of the mouse neocortex exposed to chemical ischemia. The Journal of Physiology 601: 2975–2990, 2023.

      Passlick S, Rose CR, Petzold GC, Henneberger C. Disruption of Glutamate Transport and Homeostasis by Acute Metabolic Stress. Front Cell Neurosci 15: 637784, 2021.

      Tanaka E, Yamamoto S, Kudo Y, Mihara S, Higashi H. Mechanisms Underlying the Rapid

      Depolarization Produced by Deprivation of Oxygen and Glucose in Rat Hippocampal CA1 Neurons In Vitro. Journal of Neurophysiology 78: 891–902, 1997.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Review:

      Reviewer #1 (Public Review):

      In 'Systems analysis of miR-199a/b-5p and multiple miR-199a/b-5p targets during chondrogenesis', Patel et al. present a variety of analyses using different methodologies to investigate the importance of two miRNAs in regulating gene expression in a cellular model of cartilage development. They first re-analysed existing data to identify these miRNAs as one of the most dynamic across a chondrogenesis development time course. Next, they manipulated the expression of these miRNAs and showed that this affected the expression of various marker genes as expected. An RNA-seq experiment on these manipulations identified putative mRNA targets of the miRNAs which were also supported by bioinformatics predictions. These top hits were validated experimentally and, finally, a kinetic model was developed to demonstrate the relationship between the miRNAs and mRNAs studied throughout the paper.

      I am convinced that the novel relationships reported here between miR-199a/b-5p and target genes FZD6, ITGA3, and CAV1 are likely to be genuine. It is important for researchers working on this system and related diseases to know all the miRNA/mRNA relationships but, as the authors have already published work studying the most dynamic miRNA (miR-140-5p) in this biological system I was not convinced that this study of the second miRNA in their list provided a conceptual advance on their previous work.

      We believe this study is an enhancement on our previous work for two reasons, which have been alluded to in new text within the introduction. Firstly, our previous work used experimental and bioinformatic analysis to identify microRNAs with significant regulatory roles during chondrogenesis. This new manuscript additionally uses  a systems biology approaches to identify novel miRNA-mRNA interactions and capture these within an in silico model. Secondly, this work was initiated by the analysis of our previously generated data – using a novel tool we developed for this type of data (Bioconductor - TimiRGeN).  

      I was also concerned with the lack of reporting of details of the manipulation experiments. The authors state that they have over-expressed miR-199a-5p (Figure 2A) and knocked down miR-199b-5p (Figure 2B) but they should have reported their proof that these experiments had worked as predicted, e.g. showing the qRT-PCR change in miRNA expression. Similarly, I was concerned that one miRNA was over-expressed while the other was knocked down - why did the authors not attempt to manipulate both miRNAs in both directions? Were they unable to achieve a significant change in miRNA expression or did these experiments not confirm the results reported in the manuscript?

      We agree with the reviewer that some additional data were needed to demonstrate the effective regulation of miR-199-5p.  Hence, Supplementary Figure 1 is now included which provides validation of the effects of miR-199a-5p overexpression

      (Supplementary Figure 1A) and inhibition of miR-199a/b-5p (Supplementary Figure 1B). Within the main manuscript, Figure 2B has been amended to include the consequences of inhibition of miR-199a-5p, with 2C showing the consequences of miR-199b-5p inhibition. Further, we include new data with regards to miR-199a/b-5p inhibition on CAV1 (Figure 4A). 

      I had a number of issues with the way in which some of the data was presented. Table 1 only reported whether a specific pathway was significant or not for a given differential expression analysis but this concealed the extent of this enrichment or the level of statistical significance reported. Could it be redrawn to more similarly match the format of Figure 3A? The various shades of grey in Figure 2 and Figure 4 made it impossible to discriminate between treatments and therefore identify whether these data supported the conclusions made in the text. It also appeared that the same results were reported in Figure 3B and 3C and, indeed, Figure 3B was not referred to in the main text. Perhaps this figure could be made more concise by removing one of these two sets of panels.

      We agree with all points made here and have amended these within the manuscript. Figure 1A is now pathway enrichment plots from the TimiRGeN R Bioconductor package, and the table which previously showed the pathways enriched at each time point is now in the supplementary materials (supp. Table 1). Figure 2 and 4 now have color instead of shades of grey. Figure 3C has now been moved to supplementary materials (Supplementary Figure 2) and is referenced in the text. 

      Overall, while I think that this is an interesting and valuable paper, I think its findings are relatively limited to those interested in the role of miRNAs in this specific biomedical context.

      Reviewer #2 (Public Review):

      Summary:

      This study represents an ambitious endeavor to comprehensively analyze the role of miR199a/b-5p and its networks in cartilage formation. By conducting experiments that go beyond in vitro MSC differentiation models, more robust conclusions can be achieved.

      Strengths:

      This research investigates the role of miR-199a/b-5p during chondrogenesis using bioinformatics and in vitro experimental systems. The significance of miRNAs in chondrogenesis and OA is crucial, warranting further research, and this study contributes novel insights.

      Weaknesses:

      While miR-140 and miR-455 are used as controls, these miRNAs have been demonstrated to be more relevant to Cartilage Homeostasis than chondrogenesis itself. Their deficiency has been genetically proven to induce Osteoarthritis in mice. Therefore, the results of this study should be considered in comparison with these existing findings.

      We agree with the reviewers comments. miR-455-null mice develop normally but miR-140-null (or mutated) mice and humans do have skeletal abnormalities (e.g. Nat Med. 2019 Apr;25(4):583-590. doi: 10.1038/s41591-019-0353-2), indicating a role in chondrogenesis.  We have made an addition in the description to point towards the need to assess the roles miR-199a/b-5p may play during skeletogenesis and OA. We anticipate miR-199a/b-5p to be relevant in OA and have ongoing additional work for this – but this beyond the scope of this manuscript. 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      Beyond the issues raised in the public review, I had a few minor recommendations that are largely designed to help improve the understanding of the manuscript as it is currently written.

      (1) Please provide the statistical tests used to obtain p-values in the Figure 2 and 4 legends.

      We have now added statistical test information to the figure legends of figures 2 and 4.

      (2) It is stated on p. 9 that both miRNAs may share a functional repertoire because 25 and 341 genes are interested between their inhibition experiments. Please provide statistical support that this overlap is an enrichment over the null background in this experiment. Total DE genes – chi squared. Expected / Observed. 

      A chi-squared test is now presented in the manuscript which shows that the number of significant genes which were found in common between miR-199a-5p knockdown and miR-199b-5p knockdown were significantly more than expected for day 0 or day 1 of the experiments. 

      (3) The final sentence on p. 12 (beginning 'Size of the points reflect...') seemed out of place - is it part of a legend?

      Thank you for pointing out this mistake - it was part of figure 3C and now is in the supplementary materials.

      (4) A sentence on p. 14 reads that 'FZD6 and ITGA3 levels increased significantly' but this should read decreased, rather than increased. Quite an important typo!

      Thank you for pointing this error out. It has been corrected.

      (5) Theoretical transcripts are mentioned in the legend of Figure 5A but these were not present in the figure. Please include these or remove them from the legend.

      This error has been removed form Figure 5A.

      (6) On p 20, the references 22 and 27 should I think be moved to earlier in the sentence (after 'miR-199a-5p-FZD6 has been predicted previously'). Currently, it reads as if these references support your luciferase assays which you claim are the first evidence for this target relationship.

      We agree with this change and have corrected the manuscript.

      (7) The reference to Figure 5D on p. 20 should be a reference to Figure 5C.

      Thank you for pointing this error out – this has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      (1) The paper is based on the importance of miR-140 and miR-455 as miRNAs in chondrogenesis, citing only Barter, M. J. et al. Stem Cells 33, (2015). Considering the scope and results of this study, this citation is insufficient.

      We agree with this reviewers comments. For many year miR-140 and miR-455 have been experimented on and their importance in OA research has become apparent. We included additional references within the introduction to address this.

      (2) Analyzing chondrogenesis solely through differentiation experiments from MSCs is inadequate. It is essential to perform experiments involving the network within normal cartilage tissue and/or the generation of knockout mice to understand the precise role of miR199a/b-5p in chondrogenesis.

      We have added an additional paragraph in the discussion to state this, and do believe it is highly important that miR-199a/b-5p be tested in OA samples – however this would be beyond the intended scope of this article.

      (3) In light of the above points, it is imperative to investigate the role of miR-199a/b-5p beyond the in vitro differentiation model from MSCs, encompassing mouse OA models or human disease samples.

      In tangent with the previous address, we agree with the pretense and believe additional experiments should be performed to gain more insight to the mechanism of how miR-199a/b-5p regulate OA. But development of a new mouse line to investigate this is not in the scope of this manuscript.

    1. Author response:

      eLife assessment

      This important study describes the crystallographic screening of a number of small molecules against a viral enzyme critical for the 5' capping of SARS-CoV-2 RNA and viral replication. While the high-quality crystal structures and complementary biophysical assays in this study provide solid evidence to support the major claims regarding how these small molecule compounds bind to the viral enzyme, the mismatch between the antiviral activity and binding to the viral enzyme of several small molecule compounds could have been more thoroughly investigated or discussed. This paper would be of interest to the fields of coronavirus biology, structural biology, and drug discovery.

      We do fully agree that the antiviral assay results could be brought better into context clarifying that the antiviral effects of tubercine and its derivates are due to off-target effects.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript describes the crystallographic screening of a number of small molecules derived from the natural substrates S-adenosyl methionine (SAM) and adenine, against the SARS-CoV-2 2'-O-methyltransferase NSP16 in complex with its partner NSP10. High-quality structures for several of these are presented together with efforts to evaluate their potential biophysical binding and antiviral activities. The structures are of high quality and the data are well presented but do not yet show potency in biophysical binding. They only offer limited insights into the design of inhibitors of NSP16/10.

      Strengths:

      The main strengths of the study are the high quality of the structural data, and associated electron density maps making the structural data highly accurate and informative for future structure-based design. These results are clearly presented and communicated in the manuscript. Another strength is the authors' attempts to probe the binding of the identified fragments using biophysical assays. Although in general the outcome of these experiments shows negative data or very weak binding affinities the authors should be commended for attempting several techniques and showing the data clearly. This study is also useful as an example of the complexities associated with drug discovery on a bi-substrate target such as a methyltransferase, several of the observed binding poises were unexpected with compounds that are relatively similar to substrates binding in different parts of the active site or other unexpected orientations. This serves as an example of how experimental structural information is still of crucial importance to structure-based drug design. In general, the claims in the manuscript are well supported by the data.

      Weaknesses:

      The main limitations of the study are that the new structures generated in the study are fairly limited in terms of chemical space being similar to either SAM or RNA-CAP analogues. It feels a little bit of a lost opportunity to expand this to more diverse ligands which may reveal potential inhibitors that are distinct from current methyltransferase inhibitors based on SAM analogues and truly allow a selective targeting of this important target.

      It is true that it makes sense to screen for more diverse compounds to expand to a more diverse ligand set and we do hope our study motivates to do so. Given the limited number of crystal structures of nsp10-16 with potential drug molecules, the aim of this study was to upgrade the data base with new complex structures to have a pool of complex structures for future compound designs with increased selectivity. Furthermore, some of the hits are known inhibitors of similar enzymes and most prominent and potent methyltransferase inhibitors are structurally related to SAM, like sinefungin and tubercidine. We do think that knowing which SAM compounds or fragments of SAM are able to bind in the nsp10-16 active site is highly valuable for further specific and optimized inhibitor design.

      Another limitation is the potentially misleading nature of the antiviral assays. It is not possible to say if these compounds display on-target activity in these assays or even if the inhibition of NSP16/10 would have any effect in these assays. Whilst the authors do mention these points I think this should be emphasized more strongly.

      That is a very valid point and we do not believe that the antiviral activity is based on on-target effects. We do agree that the way it is currently presented can be considered misleading and we indeed clarify this point in the revised version.

      Minor critical points:

      The authors state that their crystals and protein preps have co-purified SAM occupying the active site of the crystals. Presumably, this complicates the interpretation of electron density maps as many of the ligands share overlap with the existing SAM density making traditional analysis of difference maps challenging. The authors did not utilize the PanDDA analysis for this step, perhaps this is related to the presence of SAM in the ground state datasets? Also, occupancies are reported in the manuscript in some cases to two significant figures, this seems to be an overestimation of the ability of refinement to determine occupancy based on density alone and the authors should clarify how these figures were reached.

      We have used PanDDA in parallel for hit finding. We however did not see any advantages for this target over the hit finding results from the visual inspection. This is probably as mentioned because of SAM being present is the “ground state” which complicates the PanDDA map calculations.

      Regarding the occupancies, we fully agree with this comment and change it to reasonable digits and clarify how the figures were reached.  

      The molecular docking approach to pre-selection of library compounds to soak did not appear to be successful. Could the authors make any observations about the compounds selected by docking or the docking approach used that may explain this?

      Yes, it is a good point to give possible explanations why the docking approach was not successful to facilitate similar approaches in future studies.

      Reviewer #2 (Public Review):

      Summary:

      The study by Kremling et al. describes a study of the nsp16-nsp10 methyl transferase from SARS CoV-2 protein which is aimed at identifying inhibitors by x-ray crystallography-based compound screening.<br /> A set of 234 compounds were screened resulting in a set of adenosine-containing compounds or analogues thereof that bind in the SAM site of nsp16-nsp10. The compound selection was mainly based on similarity to SAM and docking of commercially available libraries. The resulting structures are of good quality and clearly show the binding mode of the compounds. It is not surprising to find that these compounds bind in the SAM pocket since they are structurally very similar to portions of SAM. Nevertheless, the result is novel and may be inspirational for the future design of inhibitors. Following up on the crystallographic screen the identified compounds were tested for antiviral activity and binding to np16-nsp10. In addition, an analysis of similar binding sites was presented.

      Strengths:

      The crystallography is solid and the structures are of good quality. The compound binding constitutes a novel finding.

      Weaknesses:

      The major weakness is the mismatch between antiviral activity and binding to the target protein. Only one of the compounds could be demonstrated to bind to the nsp16-nsp10 protein. By performing a displacement experiment using ITC Sangivamycin is concluded to bind with a Kd > 1mM. However, the same compound displays antiviral activity with an EC50 of 0.01 microM. Even though the authors do not make specific claims that the antiviral effect is due to inhibition of nsp16-nsp10, it is implicit. If the data is included, it should state specifically that the effect is not likely due to nsp16-nsp10 inhibition.

      We do believe that the antiviral data are valuable and should be published within this work. We also agree with the comment that it should be clearly stated that the antiviral effect is not likely because of nsp10-16 inhibition and we will optimize that accordingly.

      The structure of the paper and the language needs quite a lot of work to bring it to the expected quality.

      We will go through the manuscript again and further improve the structure and language as much as possible

      Technical point:

      Refinement of crystallographic occupancies to single digit percentage is not normally supported by electron density.

      We agree with that point and correct it in the revised version.

    1. Author response:

      Reviewer #1 (Public Review):  

      Weaknesses:  

      The weakness of this study lies in the fact that many of the genomic datasets originated from novel methods that were not validated with orthogonal approaches, such as DNA-FISH. Therefore, the detailed correlations described in this work are based on methodologies whose efficacy is not clearly established. Specifically, the authors utilized two modified protocols of TSA-seq for the detection of NADs (MKI67IP TSA-seq) and LADs (LMNB1-TSA-seq). Although these methods have been described in a bioRxiv manuscript by Kumar et al., they have not yet been published. Moreover, and surprisingly, Kumar et al., work is not cited in the current manuscript, despite its use of all TSA-seq data for NADs and LADs across the four cell lines. Moreover, Kumar et al. did not provide any DNA-FISH validation for their methods. Therefore, the interesting correlations described in this work are not based on robust technologies.    

      An attempt to validate the data was made for SON-TSA-seq of human foreskin fibroblasts (HFF) using multiplexed FISH data from IMR90 fibroblasts (from the lung) by the Zhuang lab (Su et al., 2020). However, the comparability of these datasets is questionable. It might have been more reasonable for the authors to conduct their analyses in IMR90 cells, thereby allowing them to utilize MERFISH data for validating the TSA-seq method and also for mapping NADs and LADs. 

      We disagree with the statement that the TSA-seq approach and data has not been validated by orthogonal approaches and with the conclusion that the TSA-seq approach is not robust as summarized here and detailed below in “Specific Comments”.  TSA-seq is robust because it is based only on the original immunostaining specificity provided by the primary and secondary antibodies plus the diffusion properties of the tyramide-free radical. TSA-seq has been extensively validated by microscopy and by the orthogonal genomic measurements provided by LMNB1 DamID and NAD-seq.  This includes: a) the initial validation by FISH of both nuclear speckle (to an accuracy of ~50 nm) and nuclear lamina TSA-seq  and the cross-validation of nuclear lamina TSA-seq with lamin B1 DamID in a first publication (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108); b) the further validation of SON TSA-seq by FISH in a second publication ((Zhang et al, Genome Research 2021, doi:10.1101/gr.266239.120); c) the cross-validation of nucleolar TSA-seq using NAD-seq and the validation by light microscopy of the predictions of differences in the relative distributions of centromeres, nuclear speckles, and nucleoli made from nuclear speckle, nucleolar, and pericentric heterochromatin TSA-seq in the Kumar et al, bioRxiv preprint (which is in a last revision stage involving additional formatting for the journal requirements) doi:https://doi.org/10.1101/2023.10.29.564613; d) the extensive validation of nuclear speckle, LMNB1, and nucleolar TSA-seq generated in HFF human fibroblasts using published light microscopy distance measurements of hundreds of probes generated by multiplexed immuno-FISH MERFISH data (Su et al, Cell 2020, https://doi.org/10.1016/j.cell.2020.07.032), as we described for nucleolar TSA-seq in the Kumar et al, bioRxiv preprint and to some extent for LMNB1 and SON TSA-seq in the current manuscript version (see Specific Comments with attached Author response image 2).

      Reviewer 1 raised concerns regarding this FISH validation given that the HFF TSA-seq and DamID data was compared to IMR90 MERFISH measurements.  The Su et al, Cell 2020 MERFISH paper came out well after the 4D Nucleome Consortium settled on HFF as one of the two main “Tier 1” cell lines.  We reasoned that the nuclear genome organization in a second fibroblast cell line would be sufficiently similar to justify using IMR90 FISH data as a proxy for our analysis of our HFF data. Indeed, there is a high correlation between the HFF TSA-seq and distances measured by MERFISH to nuclear lamina, nucleoli, and nuclear speckles (Author response image 1).  Comparing HFF SON-TSA-seq data with published IMR90 SON TSA-seq data (Alexander et al, Mol Cell 2021, doi.org/10.1016/j.molcel.2021.03.006), the HFF SON TSA-seq versus MERFISH scatterplot is very similar to the IMR90 SON TSA-seq versus MERFISH scatterplot.  We acknowledge the validation provided by the IMR90 MERFISH is limited by the degree to which genome organization relative to nuclear locales is similar in IMR90 and HFF fibroblasts. However, the correlation between measured microscopic distances from nuclear lamina, nucleoli, and nuclear speckles and TSA-seq scores is already quite high. We anticipate the conclusions drawn from such comparisons are solid and will only become that much stronger with future comparisons within the same cell line.

      Author response image 1.

      Scatterplots showing the correlation between TSA-seq and MERFISH microscopic distances. Top: IMR90 SON TSA-seq (from Alexander et al, Mol Cell 2021) (left) and HFF SON TSA-seq (right) (x-axis) versus distance to nuclear speckles (y-axis). Bottom: HFF Lamin B1 TSA-seq (x-axis) versus distance to nuclear lamina (y-axis) (left) and HFF MKI67IP (nucleolar) TSA-seq (x-axis) versus distance to nucleolus (y-axis) (right).

      In our revision, we will add justification of the use of IMR90 fibroblasts as a proxy for HFF fibroblasts through comparison of available data sets. 

      Reviewer #2 (Public Review):  

      Weaknesses:  

      The experiments are largely descriptive, and it is difficult to draw many cause-and-effect relationships. Similarly, the paper would be very much strengthened if the authors provided additional summary statements and interpretation of their results (especially for those not as familiar with 3D genome organization). The study would benefit from a clear and specific hypothesis.

      We acknowledge that this study was hypothesis-generating rather than hypothesis-testing in its goal. This research was funded through the NIH 4D-Nucleome Consortium, which had as its initial goal the development, benchmarking, and validation of new genomic technologies.  Our Center focused on the mapping of the genome relative to different nuclear locales and the correlation of this intranuclear positioning of the genome with functions- specifically gene expression and DNA replication timing. By its very nature, this project has taken a discovery-driven versus hypothesis-driven scientific approach.  Our question fundamentally was whether we could gain new insights into nuclear genome organization through the integration of genomic and microscopic measurements of chromosome positioning relative to multiple different nuclear compartments/bodies and their correlation with functional assays such as RNA-seq and Repli-seq.

      Indeed, as described in this manuscript, this study resulted in multiple new insights into nuclear genome organization as summarized in our last main figure.  We believe our work and conclusions will be of general interest to scientists working in the fields of 3D genome organization and nuclear cell biology.  We anticipate that each of these new insights will prompt future hypothesis-driven science focused on specific questions and the testing of cause-and-effect relationships. 

      Given the extensive scope of this manuscript, we were limited in the extent that we could describe and summarize the background, data, analysis, and significance for every new insight. In our editing to reach the eLife recommended word count, we removed some of the explanations and summaries that we had originally included. 

      As suggested by Reviewer 2, in our revision we will add back additional summary and interpretation statements to help readers unfamiliar with 3D genome organization.

      Specific Comments in response to Reviewer 1:

      (1)  We disagree with the comment that TSA-seq has not been cross-validated by other orthogonal genomic methods.  In the first TSA-seq paper (Chen et al, JCB 2018, doi: 10.1083/jcb.201807108), we showed a good correlation between the identification of iLADs and LADs by nuclear lamin and nuclear speckle TSA-seq and the orthogonal genomic method of lamin B1 DamID, which is reproduced using our new TSA-seq 2.0 protocol in this manuscript.  Similarly, in the Kumar et al, bioRxiv preprint (doi:https://doi.org/10.1101/2023.10.29.564613), we showed a general agreement between the identification of NADs by nucleolar TSA-seq and the orthogonal genomic method of NAD-seq.  (We expect this preprint to be in press soon; it is now undergoing a last revision involving only reformatting for journal requirements.) Additionally, we also showed a high correlation between Hi-C compartments and subcompartments and TSA-seq in the Chen et al, JCB 2018 paper. Specifically, there is an excellent correlation between the A1 Hi-C subcompartment and Speckle Associated Domains as detected by nuclear speckle TSA-seq.  Additionally, the A2 Hi-C subcompartment correlated well with iLAD regions with intermediate nuclear speckle TSA-seq scores, and the B2 and B3 Hi-C subcompartments with LADs detected by both LMNB TSA-seq and LMNB1 DamID.  More generally, Hi-C A and B compartment identity correlated well with predictions of iLADs versus LADs from nuclear speckle and nuclear lamina TSA-seq.

      (2)  In the Chen et al, JCB 2018 paper we also qualitatively and quantitatively validated TSA-seq using FISH.  Qualitatively, we showed that both nuclear speckle and nuclear lamin TSA-seq correlated well with distances to nuclear speckles versus the nuclear lamina, respectively, measured by immuno-FISH.

      Quantitatively, we showed that SON TSA-seq could be used to estimate the microscopic mean distance to nuclear speckles with mean and median residuals of ~50 nm.  First, we used light microscopy to show that the spreading of tyramide-biotin signal from a point-source of TSA staining fits well with the exponential decay predicted theoretically by reaction-diffusion equations assuming a steady rate of tyramide-biotin free radical generation by the HRP enzyme and a constant probability throughout the nucleus of free-radical quenching (through reaction with protein tyrosine residues and nucleic acids).  Second, we used the exponential decay constant measured by light microscopy together with FISH measurements of mean speckle distance for several genomic regions to fit an exponential function and to predict distance to nuclear speckles genome-wide directly from SON TSA-seq sequencing reads.  Third, we used this approach to test the predictions against a new set of FISH measurements, demonstrating an accuracy of these predictions of ~50 nm.

      (3)  The importance of the quantitative validation by immuno-FISH of using TSA-seq to estimate mean distance to nuclear speckles is that it demonstrates the robustness of the TSA-seq approach.  Specifically, it shows how the TSA-seq signal is predicted to depend only on the specificity of the primary and secondary antibody staining and the diffusion properties of the tyramide-biotin free radicals produced by the HRP peroxidase.  This is fundamentally different from the significant dependence on antibodies and choice of marker proteins for molecular proximity assays such as DamID, ChIP-seq, and Cut and Run/Tag which depend on molecular proximity for labeling and/or pulldown of DNA.

      This robustness leads to specific predictions.  First, it predicts similar TSA-seq signals will be produced using antibodies against different marker proteins against the same nuclear compartment.  This is because the exponential decay constant (distance at which the signal drops by one half) for the spreading of the TSA is in the range of several hundred nm, as measured by light microscopy for several TSA staining conditions.  Indeed, we showed in the Chen et al, JCB 2018 paper that antibodies against two different nuclear speckle proteins produced very similar TSA-seq signals while antibodies against LMNB versus LMNA also produced very similar TSA-seq signals.  Similarly, we showed in the Kumar et al preprint that antibodies against four different nucleolar proteins showed similar TSA-seq signals, with the highest correlation coefficients for the TSA-seq signals produced by the antibodies against two GC nucleolar marker proteins and the TSA-seq signals produced by the antibodies against two FC/DFC nucleolar marker proteins.

      Author response image 2.

      Comparison of TSA-seq data from different cell lines versus IMR90 MERFISH.  The observed correlation between SON (nuclear speckle) TSA-seq versus MERFISH is nearly as high for TSA-seq data from HFF as it is for TSA-seq data from the IMR90 cell line (Alexander et al, Mol Cell 2021) in which the MERFISH was performed. The correlations for SON, LMNB1 (nuclear lamina) and MKI67IP (nucleolus) versus MERFISH are highest for HFF TSA-seq data as compared to TSA-seq data from other cell lines (H1, K562, HCT116).  Comparison of measured distances to nuclear locale (y-axis) versus TSA-seq scores (x-axis) from different cell lines labeled in red. Left to right: SON, LMNB1, and MKI67IP.  Top to bottom: SON TSA-seq versus MERFISH for two TSA-seq replicates; TSA-seq from HFF, H1, K562, and HCT116 versus MERFISH.

      Second, it predicts that the quantitative relationship between TSA-seq signal and mean distance from a nuclear compartment will depend on the convolution of the predicted exponential decay of spreading of the TSA signal produced by a point source with the more complicated staining distribution of nuclear compartments such as the nuclear lamina or nucleoli.  We successfully used this concept to explain the differences emerging between LMNB1 DamID and TSA-seq signals for flat nuclei and to recognize the polarized distribution of different LADs over the nuclear periphery.

      (4)  After our genomic data production and during our data analysis, a valuable resource from the Zhuang lab was published, using MERFISH to visualize hundreds of genomic loci in IMR90 cells. We acknowledge that the much more extensive validation of TSA-seq by the multiplexed immuno-FISH MERFISH data is dependent on the degree to which the nuclear genome organization is similar between IMR90 and HFF fibroblasts.  However, the correlation between distances to nuclear speckles, nucleoli, and the nuclear lamina measured in IMR90 fibroblasts and the nuclear speckle, nucleolar, and nuclear lamina TSA-seq measured in HFF fibroblasts is already striking (See Author response image 1, below).  With regard to SON TSA-seq, the MERFISH versus HFF TSA-seq correlation is close to what we observe using published IMR90 SON TSA-seq data (correlation coefficients of 0.89 (IMR90 TSA-seq) versus 0.86 (HFF TSA-seq).  Moreover, this correlation is highest using TSA-seq data from HFF cells as compared to the three other cell lines. (see Author response image 1).  We believe these correlations can be considered a lower bound on the actual correlations between the FISH distances and TSA-seq that we would have observed if we had performed both assays on the same cell line. 

      (5)  Currently, we still require tens of millions of cells to perform each TSA-seq assay.  This requires significant expansion of cells and a resulting increase in passage numbers of the IMR90 cells before we can perform the TSA-seq. During this expansion we observe a noticeable slowing of the IMR90 cell growth as expected for secondary cell lines as we approach the Hayflick limit.  We still do not know to what degree nuclear organization relative to nuclear locales may change as a function of cell cycle composition (ie percentage of cycling versus quiescent cells) and cell age.  Thus, even if we performed TSA-seq on IMR90 cells we would be comparing MERFISH from lower passages with a higher percentage of actively proliferating cells with TSA-seq from higher passages with a higher percentage of quiescent cells. 

      We are currently working on a new TSA-seq protocol that will work with thousands of cells.  We believe it is better investment of time and resources to wait until this new protocol is optimized before we repeat TSA-seq in IMR90 cells for a better comparison with multiplexed FISH data. 

      Specific Comments in response to Reviewer 2:

      (1)  As we acknowledge in our Response summary, we were limited in the degree to which we could actually follow-up our findings with experiments designed to test specific hypotheses generated by our data.  However, we do want to point out that our comparison of wild-type K562 cells with the LMNA/LBR double knockout was designed to test the long-standing model that nuclear lamina association of genomic loci contributes to gene silencing.  This experiment was motivated by our surprising result that gene expression differences between cell lines correlated strongly with differences in positioning relative to nuclear speckles rather than the nuclear lamina.  Despite documenting in these double knockout cells a decreased nuclear lamina association of most LADs, and an increased nuclear lamina association of the “p-w-v” fiLADs identified in this manuscript, we saw no significant change in gene expression in any of these regions as compared to wild-type K562 cells.  Meanwhile, distances to nuclear speckles as measured by TSA-seq remained nearly constant.

      We would argue that this represents a specific example in which new insights generated by our genomics comparison of cell lines led to a clear and specific hypothesis and the experimental testing of this hypothesis.

      In response to Reviewer 2, we are modifying the text to make this clearer and to explicitly describe how we were testing the hypothesis that distance to nuclear lamina is correlated with but not causally linked to gene expression and how to test this hypothesis we used a DKO of LMNA and LBR to change distances relative to the nuclear lamina and to test the effect on gene expression.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study defines a fundamental aspect of protein kinase signalling in the protist parasite Toxoplasma gondii that is required for acute and chronic infections. The authors provide compelling evidence for the role of SPARK/SPARKEL kinases in regulating cAMP/cGMP signalling, although evidence linking the loss of these kinases to changes in the phosphoproteome is incomplete. Overall, this study will be of great interest to those who study Toxoplasma and related apicomplexan parasites.

      We thank the reviewers for their thoughtful and positive evaluation of our work. Below, we have addressed all of the public reviews and recommendations for the authors in point-by-point responses. Additionally, we include with this resubmission RT-qPCR data where we observe no significant change in transcript levels for the relevant AGC kinases, supporting the hypothesis that SPARK/SPARKEL–regulation is post-translational.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Herneisen et al characterise the Toxoplasma PDK1 orthologue SPARK and an associated protein SPARKEL in controlling important fate decisions in Toxoplasma. Over recent years this group and others have characterised the role of cAMP and cGMP signalling in negatively and positively regulating egress, motility, and invasion, respectively. This manuscript furthers this work by showing that SPARK and SPARKEL likely act upstream, or at least control the levels of the cAMP and cGMP-dependent kinases PKA and PKG, respectively, thus controlling the transition of intracellular replicating parasites into extracellular motile forms (and back again).

      The authors use quantitative (phospho)proteomic techniques to elegantly demonstrate the upstream role of SPARK in controlling cAMP and cGMP pathways. They use sophisticated analysis techniques (at least for parasitology) to show the functional association between cGMP and cAMP signalling pathways. They therefore begin to unify our understanding of the complicated signalling pathways used by Toxoplasma to control key regulatory processes that control the activation and suppression of motility. The authors then use molecular and cellular assays on a range of generated transgenic lines to back up their observations made by quantitative proteomics that are clear in their design and approach.

      The authors then extend their work by showing that SPARK/SPARKEL also control PKAc3 function. PKAc3 has previously been shown to negatively regulate differentiation into bradyzoite forms and this work backs up and extends this finding to show that SPARK also controls this. The authors conclude that SPARK could act as a central node of regulation of the asexual stage, keeping parasites in their lytic cell growth and preventing differentiation. Whether this is true is beyond the scope of this paper and will have to be determined at a later date.

      Strengths:

      This is an exceptional body of work. It is elegantly performed, with state-of-the-art proteomic methodologies carefully being applied to Toxoplasma. Observations from the proteomic datasets are masterfully backed up with validation using quantitative molecular and cellular biology assays.

      The paper is carefully and concisely written and is not overreaching in its conclusions. This work and its analysis set a new benchmark for the use of proteomics and molecular genetics in apicomplexan parasites.

      Weaknesses:

      This reviewer did not identify any weaknesses.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Herneisen et al. examines the Toxoplasma SPARK kinase orthologous to mammalian PDK1 kinase. The extracellular signals trigger cascades of the second messengers and play a central role in the apicomplexan parasites' survival. In Toxoplasma, these cascades regulate active replication of the tachyzoites, which manifests as acute toxoplasmosis, or the development into drug-resilient bradyzoites characteristic of the chronic stage of the disease. This study focuses on the poorly understood signaling mechanisms acting upstream of such second messenger kinases as PKA and PKG. The authors showed that similar to PDK1, Toxoplasma SPARK appears to regulate several AGC kinases.

      Strengths:

      The study demonstrated a strong association of the SPARK kinase with an elongin-like SPARKEL factor and an uncharacterized AGC kinase. Using a set of standard assays, the authors determined the SPARK/SPARKEL role in parasite egress and invasion. Finally, the study presented evidence of the SPARK/SPARKEL involvement in the bradyzoite differentiation.

      Weaknesses:

      Although the study can potentially uncover essential sensing mechanisms operating in Toxoplasma, the evidence of the SPARK/SPARKEL mechanisms is weak. Specifically, due to incomplete data analysis, the SPARK/SPARKEL-dependent phosphoregulation of AGC kinases cannot be evaluated. The manuscript requires better organization and lacks guidance on the described experiments. Although the study is built on advanced genetics, at times, it is unnecessarily complicated, raising doubts rather than benefiting the study.

      The evidence for the SPARK/SPARKEL interaction is demonstrated through diverse experimental approaches that are internally consistent. Five separate mass spectrometry experiments, with replicates and appropriate controls, with tags on either SPARK or SPARKEL, showed that SPARK and SPARKEL form a strong interaction (Figure 1A, 1D, 1E; Figure 1—figure supplement 1). Global mass spectrometry experiments assessing the impact of  SPARK or SPARKEL depletion showed similar features (a reduction in PKG and PKA abundance and up-regulation of bradyzoite-associated proteins; Figure 3C–D). The phenotypes associated with SPARK and SPARKEL depletion phenocopy one another in all cell biological assays we tested (Figure 2A, 2D and PMID: 35484233; Figure 2E–J; Figure 4E–F; Figure 6A–B). Measuring the abundance of SPARK and SPARKEL in unenriched samples was challenging, but immunoblotting and proteomics suggest that depletion of one factor leads to down-regulation of the other (Figure 2B, 2C; Figure 3—figure supplement 1), which explains the genetic and cell biological phenocopying described above. We note that “further biochemical studies are required to discern the regulatory interactions between SPARK and SPARKEL” (first submission lines 590-591) and are beyond the scope of this work.

      The evidence for SPARK/SPARKEL regulation of AGC kinase activity is demonstrated through diverse experimental approaches that are also internally consistent. PKA C1 and PKG abundance levels decrease in parasites depleted of SPARK/SPARKEL, as measured by mass spectrometry (Figure 3A and 3C) and cell-based assays for PKA C1/R (Figure 4D–F). Comparisons of the global SPARK-, PKA R-, PKG-, and PKA C3-depleted phosphoproteomes suggest that PKA and PKG activity is reduced upon SPARK depletion whereas the activity of an unrelated factor (PP1) is unaffected (Figure 4G–H, Figure 4—figure supplement 1, Figure 5D–E, Figure 7I–J). Parasites depleted of SPARK are hypersensitized to a PKG inhibitor (Figure 5B–C). SPARK, PKA, and PKG are proximal in cellulo (Figure 3I) and SPARK co-purifies with PKA C3 (Figure 7A). The kinetic-phase phenotypes associated with SPARK and SPARKEL depletion (PMID: 32379047, Figure 2A, 2D–2J) are consistent with reduced PKG activity (PMID: 28465425) and only develop after PKG has been depleted as shown by proteomics experiments (Figure 2E-J and Figure 3C). Other studies have shown that the effects of reduced PKG activity are dominant to reduced PKA C1 activity (PMID: 29030485). The replicative-phase phenotypes associated with SPARK and SPARKEL depletion are consistent with reduced PKA C3 activity (PMID: 27247232 and herein). Mechanistically, PKG and PKA C1 activity must be lower in SPARK-depleted parasites because the abundances of these kinases are lower (Figure 3A, 3C). The mechanism of regulation may be more complex in the case of PKA C3, as SPARK depletion did not cause a reduction in PKA C3 abundance as measured by cellular assays (Figure 7B–F), but PKA C3 activity decreased (Figure 7I–K). We concede that multiple mechanisms may lead to the reduction in PKA C1 and PKG abundances, such as decreased activation loop phosphorylation and autophosphorylation at other stabilizing sites or enhanced ubiquitin ligase activity leading to active degradation of the kinases; we have moved speculation regarding such mechanisms to the Discussion.

      Although the reviewer commented that the manuscript “requires better organization” in the public review, no specific recommendations were provided to the authors. Therefore, we did not change the organization of the manuscript. We added an additional paragraph to the Discussion to reiterate key findings: “A prior study identified SPARK as a regulator of parasite invasion and egress following 24 hours of kinase depletion (Smith et al., 2022). Unexpectedly, we observed that three hours of SPARK or SPARKEL depletion were insufficient to impact T. gondii motility or calcium-dependent signaling, indicating that the phenotypes associated with SPARK and SPARKEL depletion develop over time. Quantitative proteomics revealed that PKA and PKG abundances began to decrease after more than three hours of SPARK depletion. Proximity labeling experiments also suggested that SPARK, PKA, and PKG are spatially associated within the parasite cell. We propose a model in which SPARK down-regulation coincides with reduced PKG and PKA activity due to diminished protein levels.” This work built upon genetic and proteomic approaches recently described by our group, which we cited in the text and extensive methods section. We added additional experimental detail where noted in the reviewer’s recommendations to the authors.

      The study utilizes advanced genetics because biochemical tools for eukaryotic parasites are limited. For example, no antibodies for T. gondii SPARK, PKA subunits, or PKG exist; to say nothing of phosphosite-specific antibodies, which are common in the mammalian cell signaling field. Therefore, to measure the relationship between SPARK, SPARKEL, and PKA subunits, we had to generate strains in which multiple proteins were tagged with epitopes for downstream analysis. The genetic experiments included appropriate controls and were internally consistent with results obtained using orthogonal approaches, such as mass spectrometry.

      Reviewer #3 (Public Review):

      Summary:

      This paper focuses on the roles of a toxoplasma protein (SPARKEL) with homology to an elongin C and the kinase SPARK that it interacts with. They demonstrate that the two proteins regulate the abundance of PKA and PKG, and that depletion of SPARKEL reduces invasion and egress (previously shown with SPARK), and that their loss also triggers spontaneous bradyzoite differentiation. The data are overall very convincing and will be of high interest to those who study Toxoplasma and related apicomplexan parasites.

      Strengths:

      The study is very well executed with appropriate controls. The manuscript is also very well and clearly written. Overall, the work clearly demonstrates that SPARK/SPARKEL regulate invasion and egress and that their loss triggers differentiation.

      Weaknesses:

      (1) The authors fail to discriminate between SPARK/SPARKEL acting as negative regulators of differentiation as a result of an active role in regulating stage-specific transcription/translation or as a consequence of a stress response activated when either is depleted

      We demonstrate a novel function for SPARK and SPARKEL as negative regulators of differentiation. The pathways leading to differentiation are being actively studied. Up-regulation of a positive transcriptional regulator of chronic differentiation, BFD1, is sufficient to trigger differentiation in vitro in the absence of other stressful growth conditions (PMID: 31955846). SPARK or SPARKEL depletion results in up-regulation of proteins that are up-regulated upon BFD1 overexpression. Whether BFD1 overexpression or SPARK and SPARKEL depletion triggers cellular stress pathways is beyond the scope of the current work, which focused instead on the immediate effect of these pathways on AGC kinases. Study of the effect of the various kinases on the parasite phosphoproteome shows that the putative targets of PKA C3 are specifically downregulated upon SPARK knockdown, indicating PKA C3 activity is indeed decreased in the latter condition.

      (2) The function of SPARKEL has not been addressed. In mammalian cells, Elongin C is part of an E3 ubiquitin ligase complex that regulates transcription and other processes. From what I can tell from the proteomic data, homologs of the Elongin B/C complex were not identified. This is an important issue as the authors find that PKG and PKA protein levels are reduced in the knockdown strains

      Our experiments suggest that SPARK and SPARKEL form a complex, and down-regulation of one complex member leads to down-regulation of the other. Thus in all tested assays, knockdown of SPARK and SPARKEL phenocopy one another. Further biochemical and structural work will be required to determine the mechanism by which SPARKEL regulates SPARK.

      Nearly all studies of the function of elongin C have been conducted in mammalian cells. Proteins with elongin C domains may serve alternative and unexplored functions in unicellular eukaryotes. We searched for the presence of Elongin A/B and known Elongin C complex members in the T. gondii genome and were unable to identify orthologs, explaining why these proteins were not identified in mass spectrometry experiments. Please see our response in Recommendations for the Authors, Reviewer 3 point 2.

      Beyond the concerns raised by the review team, we have identified and corrected the following errors or omissions in the first submission of the manuscript:

      - Line 176 of the first submission referred to a “peptide sequence match (PSM)”, which we have changed to “peptide-spectrum match”.

      - We recolored and relabeled the lines in Figure 5A so that it is easier to match a specific peptide with a specific line; and also corrected a mislabeling.

      - Figure 7B SPARK panel was incorrectly centered. The raw files can be viewed in Figure 7—source data 2.

      - Figure 7—figure supplement 1D was missing an x-axis label.

      - Line 1172 referred to “Supplementary File X”, which we corrected to “Supplementary File 3”.

      - We have updated references to preprints that have since been published, including PMID: 38093015, 37933960, 37966241, and 37610220.

      Editors comments:

      The proteomics data reported in this study underpin the major findings and are very comprehensive. As noted in the reviews, it is strongly recommended that the authors normalize the levels of detected phosphopeptides against the levels of the parent protein in the different mutant lines in order to identify changes in protein phosphorylation that are linked to protein kinase activity rather than protein degradation. A focus on changes that occur at early time points following protein knock-down may also help to identify the main targets of each kinase.

      Please see our response to Reviewer 2 Recommendations for the Authors, points 1 and 2.

      Reviewer #1 (Recommendations For The Authors):

      During my reading, I only found one small mistake. In Figure 7F, the x-axis is missing the word 'PKA'.

      We have updated the x-axis to read “SPARK-AID/PKA C3-mNG (h. + IAA)”.

      All information, code, and reagents are clearly explained.

      Reviewer #2 (Recommendations For The Authors):

      How the phosphoproteome was analyzed needs to be clarified. The normalization step, computing the ratio of the phosphopeptide to the protein (peptide) intensity, appears omitted. It is the most critical step of the analysis. The minor shifts between protein and phosphosite intensity seem negligible, as seen in Figure 4 AB. The significant changes can only be deduced by calculating this ratio. In the current state, the presented results are inconclusive. The manuscript contains overreaching and often unsupported statements because the data has not been appropriately filtered. Related to this topic, it is advisable to use well-accepted terminology and complete words when describing proteome and phosphoproteome. The interexchange of a "peptide" and a "phosphopeptide" in the text confuses and misleads.

      To clarify the phosphoproteome analysis:

      We cite a previous description of the phosphoproteomics sample preparation workflow (lines 1124-1125 of the first submission for example). Our quantitative phosphoproteomics experiments comprise two datasets generated from the same multiplexed samples. The samples were split at the point of phosphopeptide enrichment. Ninety-five percent of the samples were subjected to phosphopeptide enrichment (titanium dioxide followed by nickel affinity chromatography; “enriched samples”). Five percent of the samples were reserved as a reference for the non-enriched proteome (“non-enriched samples”). To clarify this point, we have added the sentences “Approximately 95% of the proteomics sample was used for phosphopeptide enrichment” and “The remaining 5% of the sample was not subjected to the phosphopeptide enrichment protocol” to the Methods sections, after describing the multiplexing steps.

      The samples were fractionated separately and run separately on an LC-MS system, which is described in the Methods section, for example lines 1130-1149 of the first submission. Raw files of the phosphopeptide-enriched and unenriched samples were analyzed separately, which is described in the Methods section, for example lines 1151-1158 of the first submission. To clarify this point, we have added the sentence “Raw files of the phosphopeptide-enriched and unenriched samples were analyzed separately” to the Methods sections. Many of the search parameters and descriptions of normalization and protein abundances were described in lines 1085-1093 of the first submission in reference to the 24h SPARK depletion proteome. We added this information to the description of the SPARK depletion time course phosphoproteome data analysis: “The allowed mass tolerance for precursor and fragment ions was 10 ppm and 0.02 Da, respectively. False discovery was assessed using Percolator with a concatenated target/decoy strategy using a strict FDR of 0.01, relaxed FDR of 0.05, and maximum Delta CN of 0.05. Only unique peptide quantification values were used. Co-isolation and signal-to-noise thresholds were set to 50% and 10, respectively. Normalization was performed according to total peptide amount. In the case of the unenriched samples, protein abundances were calculated from summation of non-phosphopeptide abundances.”

      We hope that this clarifies how the unenriched sample protein-level abundances were calculated. When we discuss “protein abundance”, we are referencing the unenriched sample summed non-phosphopeptide abundance. Our phosphoproteome analysis was based only on phosphopeptides, as our phosphopeptide enrichment resulted in 99% efficiency, and peptides lacking phosphorylation sites were filtered out before subsequent analyses. We used “peptide” and “phosphopeptide” interchangeably because the only peptide-level analysis performed was based on phosphopeptide abundances. We have changed any mention of “peptide” to “phosphopeptide” in the main text. 

      “The normalization step, computing the ratio of the phosphopeptide to the protein (peptide) intensity, appears omitted. It is the most critical step of the analysis.”:

      Unlike common differential gene expression analysis pipelines, proteomics analysis pipelines are not settled. Many analyses do not perform peptide-to-parent-protein corrections; some normalize phosphopeptide abundances to parent protein abundances calculated from summing non-phosphopeptides or a combination of phosphopeptide and non-phosphopeptides on an ad hoc basis; some calculate global normalization factors based on regressions of protein and phosphopeptide abundances or other pairwise comparisons. A caveat of protein normalization of phosphopeptides is that it over-corrects cases in which protein abundance and phosphorylation are interdependent, as is the case for auto-phosphorylation and some activation loop phosphorylations (PMID: 37394063). We used the approach that retained the greatest complexity of the data, which is to not normalize abundances across different mass spectrometry experiments and discard information that was not in the overlap. We have updated Supplementary File 3.3 to include protein-level quantification values (from Supplementary File 3.2) if measured.

      We clarified that the phosphopeptide abundances and protein-level abundances were derived from different datasets that were each internally normalized (globally centered by total peptide amount). Protein-level abundances were summed from non-phosphopeptide abundances. The calculated log2 changes are based on the globally centered data within each dataset. We analyzed the kinetic profiles of changing phosphopeptide abundances relative to a control using approaches similar to those described for several recent temporally resolved T. gondii phosphoproteomes (e.g. PMID: 37933960, 35976251, 36265000, 29141230) and as described in the Methods. The approach does not first correct for unenriched-sample parent protein abundance—in some applications, unenriched samples are not collected at all; instead, phosphopeptide ratios are median-normalized to non-phosphopeptide ratios (quantified due to inefficient phosphopeptide enrichment) and are individually tested against the null distribution of non-phosphopeptide ratios (e.g. PMID: 36265000, 29141230). We did not use this approach because our phosphopeptide enrichment was 99% efficient (18518 phosphopeptides of 18758 peptides with quantification values). In several cases using our approach, parent protein abundance is not quantified in the unenriched proteome dataset, but phosphopeptides are reliably quantified in the enriched proteome dataset. We note that phosphopeptide abundance changes can be difficult to interpret in such cases, e.g. in the first submission lines 178-186 and 193-194. We have added similar text to the results noting that in the case of PKA and PKG, both unenriched parent protein and enriched phosphopeptide abundances decreased (see below). We have also moved speculation about whether SPARK phosphorylates the activation loop of PKA and PKG, or whether the down-regulation of PKA and PKG arises from indirect effects, to the Discussion.

      We have moved comparisons of protein and phosphopeptide abundances from the Results to the Discussion. We added the following sentences to the result section Clustering of phosphopeptide kinetics identifies seven response signatures: “Because non-phosphopeptide and phosphopeptide abundances were quantified in different mass spectrometry experiments, it is challenging to compare the rates of phosphopeptide and parent protein abundance changes, especially when phosphorylation status and protein stability are interconnected. In general, both PKA C1, PKA R, and PKG protein and phosphosite abundances decreased following SPARK depletion (Figure 3—figure supplement 1), as discussed further below. We also observed down-regulation of phosphosite and protein abundances of a MIF4G domain protein.” Figure 3—figure supplement 1E is a new panel that shows PKA C1, PKA R, and PKG phosphopeptide and parent protein abundances along with global changes in phosphopeptide and parent protein abundances in the cases which both were quantified. We changed lines 278-282 in the first submission to “The SPARK depletion time course phosphoproteome showed a reduction in the abundance of PKA C1 T190 and T341, which are located in the activation loop and C-terminal tail, respectively (Figure 4A). Several phosphosites residing in the N terminus of PKA R (e.g. S17, S27, and S94) also decreased following SPARK depletion (Figure 4B).” We changed lines 313-315 in the first submission to “The SPARK depletion time course phosphoproteome showed a reduction in the abundance of several phosphosites residing in the N terminus of PKG as well as T838, which corresponds to the activation loop (Figure 5A). By contrast, S105 did not greatly decrease, and S40 abundance slightly increased.”

      The description of experiments should be more detailed. For example, the 3, 8, and 24 h treatments were used reversely; thus, they should be emphasized as time points before natural egress. Consequently, it seems that 3h treatment should be prioritized, given the SPARK/SPARKEL role in egress/invasion. Unexpectedly, the study draws more attention to a 24-hour treatment. If the AID-SPARK/SPARKEL is eliminated within 1h, parasites undoubtedly accumulate numerous secondary defects during a prolonged 23h deprivation. Since the SPARK pathway activates kinase/phosphatase cascades, the 24h data is likely overwhelmed with the consequences of the long-term complex degradation, making it a poor source of the putative SPARK substrates. Likewise, the downregulation of PKA observed in the 8 hours after SPARK depletion may be an indirect effect of the SPARK degradation. The direct effects and immediate substrates should be detectable within 2-3h of auxin treatment of the nearly egressing cultures.

      The first submission described how parasites were harvested at 32 hours post-infection with 0, 3, 8, or 24 hours of IAA treatment (lines 157-160, 1097-1110, and Figure 3B). To reiterate this experimental detail, we have added “harvested 32 hours post-infection” to the sentence “...quantitative proteomics with tandem mass tag multiplexing that included samples with 0, 3, 8, and 24 hours of SPARK or SPARKEL depletion” and similarly in the figure legend. The time points are unrelated to natural egress because the experiment was terminated at 32 hours post-infection, which is earlier than the window typically used to study natural egress under these conditions (40-48 hours post-infection). We chose to terminate the experiment before natural egress to better localize phosphopeptide changes related to SPARK depletion. The phosphoproteome undergoes dramatic reorganization during egress due to the activity of myriad kinases and phosphatases (see PMID: 35976251, 37933960, and 36265000), which would have likely complicated the signal.

      A pivotal result motivating time-course experiments and analysis was that SPARK/SPARKEL's role in egress and invasion emerges only after an extended depletion period (Figure 2E–J, first submission lines 126-145). The 24h depletion was used in the experimental system that first identified SPARK as a regulator of egress, which motivated our initial experiments, as stated in the first submission lines 126-144 and 149-151. We draw attention to the observation that SPARK and SPARKEL phenotypes develop over time in the first submission, lines 137-145. The role for SPARK/SPARKEL in egress/invasion does not manifest at 3h depletion; it manifests at 24h depletion. To ensure that this point is not overlooked by the reader, we have created a new heading in the Results section (SPARK and SPARKEL depletion phenotypes develop over time) for the paragraph that was previously lines 137-145. The remainder of the manuscript integrates data from proteomic, genetic, and cell-based assays across temporal dimensions to build a working model of how the phenotypes associated with SPARK depletion develop over time.

      Underpinning this comment is an assumption that phosphopeptides that decrease the most rapidly following a kinase’s depletion are direct substrates, whereas phosphopeptides that decrease with slower kinetics are not. This is not always the case. Consider a kinase that phosphorylates sites on substrate A and substrate B. The site on substrate A is also the target of a phosphatase, whereas the site on substrate B is recalcitrant to phosphatase activity. If the kinase were inhibited, then the site on substrate A would be actively dephosphorylated. As measured by a phosphoproteomics experiment, the abundance of the substrate A phosphopeptide would drop rapidly due to the inactivity of the kinase and activity of the phosphatase. In the text, we called such sites “constitutively regulated” or dynamic—they are actively dephosphorylated and phosphorylated within a short timeframe. The phosphosite on substrate B is comparatively static; once it is phosphorylated by the kinase, it is unaffected by subsequent inhibition of the kinase. Only newly synthesized substrate B molecules would be affected by kinase inhibition. As measured by a phosphoproteomics experiment, the abundance of the substrate B phosphopeptide would drop more gradually after kinase inhibition, as the unphosphorylated peptide is found only on newly synthesized proteins that were not previously exposed to kinase activity. An example of the scenario described for substrate A would be that of yeast Cdk1 T14/Y15, which is phosphorylated by Wee1 and dephosphorylated by Cdc25 (e.g. PMID: 7880537). An example of the scenario described for substrate B would be that of the human PKA C activation loop T197, which is phosphorylated by PDK1 and is phosphatase-resistant under physiological conditions (e.g. PMID: 22493239, 15533936).

      Both substrate A and B may be “direct” and functionally relevant targets of the kinase. Categorizing substrates as “immediate” is comparatively less informative in this context (although it may be relevant when studying fast, synchronized processes with high temporal resolution, such as induced Plasmodium spp. gametocyte activation or stimulation of T. gondii secretion). Furthermore, our earlier experiments had shown that the role for SPARK/SPARKEL in motility manifests after 3h depletion and is complete by 24h depletion. By this logic, we were most interested in the candidates showing differences at these time points. We conducted proximity labeling experiments to identify the overlap of proteins that exhibited SPARK-dependent decreases in the global proteomics and were also proximal to SPARK in space (first submission Figure 3I and lines 260-275), thus revealing a prioritized list of candidates, which included PKG and PKA. When technically feasible, we included a temporal dimension to follow-up experiments, rather than relying on a 24h terminal comparison (e.g. Figure 4E–H, Figure 5D–E, Figure 7D–F, Figure 7I–K; all first submission).

      Fig2 (B and C). What antibodies had been used to detect tagged proteins? There is a concern regarding the use of multiple tags attached to the same protein to the point that it doubles the size of the studied protein. The switch of the mobility of the SPARK and SPARKEL on the WB due to a change in MW adds to the confusion. Furthermore, the study did not use all the fused epitopes (e.g., HA). At the same time, the same V5 tag was used to detect two factors in the same parasite. Although the controls are provided, it does not eliminate the possibility that the second band on the WB results from one protein degradation rather than the presence of two individual proteins. Different tags should be used to confirm the co-expression of two proteins. Panel E is missing the X-axis label.

      Figure 2B was incorrectly labeled; the labels corresponding to SPARK and SPARKEL were switched. We corrected this error in the revised figures. The antibodies used were mouse monoclonal anti-V5 as described in the key resources table of the first submission. We added “V5” to Figure 2A and 2B. Regarding the effect of the tagging payload attached to the proteins, we have included in all assays a control relative to a parental strain (TIR1) without a tagging payload, and additionally included internal controls within tagged strains to calculate dependency of a phenotype on IAA treatment. The western blots in Figure 2B and 2C are from two different strains and experiments. The strains and experiments are described in the first submission main text (lines 113-124), the figure legend (lines 1847-1850), the key resources table, and the methods (lines 650-664, 872-891). A description of the SPARK-AID/SPARKEL-mNG strain was included in the key resources table but omitted in the methods. We therefore added the following section to the Methods:

      “SPARKEL-V5-mNG-Ty/SPARK-V5-mAID-HA/RHΔku80Δhxgprt/TIR1

      The HiT vector cutting unit gBlock for SPARKEL (P1) was cloned into the pALH193 HiT empty vector. The vector was linearized with BsaI and co-transfected with the pSS014 Cas9 expression plasmid into SPARK-V5-mAID-HA/RHΔku80Δhxgprt/TIR1 parasites. Clones were selected with 1 µM pyrimethamine and isolated via limiting dilution to generate the SPARKEL-V5-mNG-Ty/SPARK-V5-mAID-HA/RHΔku80Δhxgprt/TIR1 strain. Clones were verified by PCR amplification and sequencing of the junction between the 3′ end of SPARKEL (5’-GGGAGGCCACAACGGCGC-3’) and 5′ end of the protein tag (5’-gggggtcggtcatgttacgt-3’).”

      To clarify the expected MW of each species, we have added the following text to the Methods:

      “The expected molecular weight of SPARKEL-V5-HaloTag-mAID-Ty is 66 kDa, from the 42.7 kDa tagging payload and 23.3 kDa protein sequence. The expected molecular weight of SPARK-V5-mCherry-HA is 89.7 kDa, from the 31.9 kDa tagging payload and 57.8 kDa protein sequence. The expected molecular weight of SPARK-V5-mAID-HA is 71.3 kDa, from the 13.5 kDa tagging payload and 57.8 kDa protein sequence. The expected molecular weight of SPARKEL-V5-mNG-Ty is 55.2 kDa, from the 31.9 kDa tagging payload and 23.3 kDa protein sequence.”

      SPARK and SPARKEL are lowly expressed, which may have been compounded by basal degradation due to the AID tag (see for example Figure 3—figure supplement 1D of the first submission). We attempted several immunoblot conditions and antibodies, and only the V5 antibody proved effective in recognizing these proteins above the limit of detection. For this reason, we included an additional single-tagged control in each immunoblot experiment. Uncropped images of the blots are included in the first submission as Figure 2—figure supplement 1D and E and as Figure 2 source data. We added the following statement to the results section of the text:

      “However, SPARK and SPARKEL abundances are low and approach the limit of detection. We could only detect each protein by the V5 epitope. Although our experiments included single-tagged controls, we cannot formally eliminate the possibility that SPARK-AID yields degradation products that run at the expected molecular weight of SPARKEL. More sensitive methods, such as targeted mass spectrometry, may be required to measure the absolute abundance and stoichiometries of SPARK and SPARKEL.”

      We added “h +IAA” to the x-axis of panel 2E.

      Fig. 3. There is plentiful proteomic data on the factor-depleted parasites. Can it be used to confirm the co-degradation of the SPARK/SPARKEL complex components? This figure mainly includes quality control data that can be moved to Supplement. Did you detect SPARKEL in the TurboID experiment described in panel I? The plot shows only an AGC kinase.

      SPARK and SPARKEL are lowly expressed, and we often do not detect SPARK or SPARKEL peptides with quantification values in complex samples (such as global depletion proteomes and phosphoproteomes; IPs and streptavidin pull-downs are comparatively less complex, with IPs being the least complex samples). We discussed this caveat in the first submission lines 178-186. To additionally clarify this point, we have added “We were unable to measure SPARK or SPARKEL abundances in this proteome” earlier in the text.

      We consider the figure panels relevant to the discussion in the text.

      SPARKEL was not quantified in the SPARK-TurboID experiment (Supplementary File 2). We have added “SPARKEL was not quantified in this experiment” to the text. “Not quantified” is a different outcome from “quantified but not enriched”. The interaction between SPARK and SPARKEL is supported by five other independent interaction experiments in which SPARKEL was quantified (Figure 1A, 1D, 1E; and Figure 1—figure supplement 1). The added insight from the SPARK proximity labeling experiments comes from integration with the global proteomics, which suggests that AGC kinases are in proximity to SPARK and exhibit SPARK-dependent stability and hence activity. The logic of the proximity labeling experiment is described in lines 258-275 of the first submission.

      Fig. 6G is missing deltaBDF1 control for unbiased evaluation of the SPARK KD effect.

      The logic of this experiment was to evaluate whether excess differentiation caused by SPARK and PKA C3 depletion (Figure 6A and 6B) was dependent on the BFD1 circuit. The ∆bfd1 phenotype is well-established under these experimental conditions: parasites lacking BFD1 do not differentiate under spontaneous or alkaline conditions (e.g. PMID: 31955846, 37081202, 37770433). Parasites lacking BFD1 do not differentiate when SPARK and PKA C3 are depleted, suggesting that differentiation caused by SPARK or PKA C3 depletion occurs through the BFD1 circuit. If differentiation caused by SPARK or PKA C3 depletion did not depend on the BFD1 circuit, we might have observed differentiation in the SPARK- and PKA C3-AID/∆bfd1 mutants.

      To clarify this point, we have changed the first sentences of the last paragraph in the results section Depletion of SPARK, SPARKEL, or PKA C3 promotes chronic differentiation: “To assess whether excess differentiation caused by SPARK and PKA C3 depletion is dependent on a previously characterized transcriptional regulator of differentiation, BFD1 (Waldman et al., 2020), we knocked out the BFD1 CDS with a sortable dTomato cassette in the SPARK- and PKA C3-AID strains (Figure 6–figure supplement 1). The resulting SPARK- and PKA C3-AID/∆bfd1 mutants failed to undergo differentiation as measured by cyst wall staining (Figure 6G–H), suggesting that differentiation caused by depletion of these kinases depends on the BFD1 circuit.”

      Lines 239-242. The logic behind the categories of "constitutively regulated sites" and "newly synthesized proteins dependent on SPARK activation" is odd. The former (3h treatment) represents the SPARK-specific events (even though it should be shortened to 1-2h), while an 8h treatment is already contaminated with secondary effects. Since Toxoplasma divides asynchronously, the "newly synthesized" proteins will be present at the time. Also, the protein phosphorylation does not always lead to substrate activation; it can be repressive, too.

      We describe the logic in response to a comment above (substrate A vs. substrate B). It is correct that T. gondii divides asynchronously, with a cell cycle of approximately 8 hours, and 60% of parasites in G1 at a given time (PMID: 11420103). The proteomics experiments measure peptide and protein abundances at a population level. Newly synthesized proteins will be present at all time points; but the proportion of proteins synthesized after SPARK depletion relative to proteins synthesized before SPARK depletion will increase over time.

      We moved lines 238-243 from the first submission to the Discussion.

      It is accurate that phosphorylation does not always lead to substrate activation; it can also be repressive or not change substrate behavior. However, in the case of protein kinases, activation loop phosphorylation is highly correlated with activation (e.g. PMID: 15350212, 31521607).

      Line 250-252: Because the SPARK degradation did not affect intracellular replication, SPARK is unlikely to affect cell cycle-specific phosphorylation.

      To parallel the prior sentences describing different SPARK-dependent down-regulated clusters, we truncated this sentence to “The final cluster of depleted phosphopeptides, Cluster 4, only exhibits down-regulation at 8h of IAA treatment.”

      SPARKEL depletion did not significantly affect intracellular replication under the time frames investigated here (approximately 25 hours post-invasion; Figure 2D). A prior study reported that SPARK depletion did not affect intracellular replication measured on a similar timescale (PMID: 35484233).

      The opening sentence of the Discussion: Typically, we refer to the newly discovered proteins as the orthologs of the previously discovered counterparts and not the vice versa. Thus, calling Toxoplasma SPARK the ortholog of mammalian PDK1 would be more appropriate.

      We changed the opening sentence of the Discussion to “SPARK is an ortholog of PDK1, which is considered a key regulator of AGC kinases”.

      Reviewer #3 (Recommendations For The Authors):

      (1) Authors should show alignment of SPARKEL with Elongin C. Are key residues conserved?

      We have added an alignment of the SKP1/BTB/POZ domains of Homo sapiens elongin C, S. cerevisiae elongin C, and T. gondii SPARKEL as Figure 1—figure supplement 1B. This panel highlights elongin B interface, cullin binding sites, and target protein binding sites based on the human elongin C annotation. As discussed below, these interfaces may not be functionally conserved in T. gondii. Ultimately, future mechanistic and structural studies beyond the scope of the current work will be required to determine how SPARK and SPARKEL physically interact. The Discussion states, “further biochemical studies are required to discern the regulatory interactions between SPARK and SPARKEL” (lines 590-591).

      (2) The failure to identify other Elongin B/C complex members should be addressed by direct IP analysis.

      Indeed, elongin C has traditionally been characterized as a component of multisubunit complexes comprising Elongin A/B/C or Elongin BC/cullin/SOCS that regulate transcription or function as ubiquitin ligases, respectively (for a review, PMID: 22649776). We see two major issues when attempting to generalize these results to apicomplexan parasites. First, nearly all studies of the function of elongin C have been conducted in a single eukaryotic supergroup (the opisthokonts, including yeast and metazoans). The majority of eukaryotic diversity exists in other supergroups, including the SAR supergroup to which apicomplexans such as T. gondii belong (PMID: 31606140). Proteins with elongin C domains may serve alternative and unexplored functions in non-opisthokont unicellular eukaryotes. Second (in support of the first), we were unable to find orthologs of many of the opisthokont complex members in T. gondii, as systematically described below.

      By BLAST, the most similar protein to SPARKEL in S. cerevisiae is ELC1 (YPL046C), with a BLAST E = 0.003. The next most similar protein was SCF ubiquitin ligase subunit SKP1 (YDR328C) with an E value of 0.62. ELC1 is 99 amino acids. The Elongin C (IPR039948) and SKP1/BTB/POZ superfamily domains (IPR011333) span most of this sequence. SPARKEL is 216 amino acids; the Elongin C and  SKP1/BTB/POZ superfamily domains occupy the C-terminal half of the protein. The N-terminal domain of SPARKEL may be important for its function; however, future work is required to address this hypothesis.

      Elongin B: Elongin B is not found universally amongst even opisthokonts; fungi and choanoflagellates lack obvious orthologs. The most similar T. gondii protein to human Elongin B (Q15370) by BLAST is TGME49_223125 (E = 0.017), an apicoplast ubiquitin-like protein PUBL (PMID: 28655825, 33053376). TGME49_223125 has a C-terminal ubiquitin-like domain (IPR000626) but no ELOB domain (IPR039049); indeed, no T. gondii protein has an ELOB domain that can be identified by sequence searching. Given the lack of similarity between EloB and TGME49_223125, as well as this protein’s possible red algal endosymbiont origin, we consider it an unlikely ortholog of EloB and topologically unlikely to  interact with the SPARK/SPARKEL complex. We did not detect TGME49_223125 in SPARK or SPARKEL IPs (Supplementary File 1).

      Elongin A: T. gondii appears to lack a human elongin A ortholog (Q14241) on the basis of sequence similarity. The most similar T. gondii protein to yeast Elongin A (O59671) by BLAST is TGME49_299230 (E = 0.022). Yeast EloA is 263 amino acids. TGME49_299230 is 1101 amino acids and does not have an EloA domain (IPR010684), suggesting it is not a true EloA ortholog.

      Suppressor of cytokine signaling (SOCS): T. gondii appears to lack human SOCS1 or SOCS2 orthologs (O15524 and O14508) on the basis of sequence similarity. We were unable to identify T. gondii proteins with SOCS domains (PF07525, SM00253, SM00969, and SSF158235).

      Von Hippel-Lindau tumor suppressor (VHL): T. gondii appears to lack a human VHL ortholog (P40337) on the basis of sequence similarity.  We were unable to identify T. gondii proteins with VHL domains (IPR024048, IPR024053, PF01847, and SSF49468).

      Cul-2/5: Cullins appeared early in the eukaryotic radiation (PMID: 21554755), and thus T. gondii possesses several. Since the ELC complex has been best characterized with human cullin-2 (Q13617) and cullin-5 (Q93034), we searched for orthologs of these proteins and identified TGME49_289310, TGME49_289310, and TGME49_316660. TGME49_289310 functionally resembles cullin-1 of the SCF complex (PMID: 31348812). None of these proteins were enriched in the SPARK or SPARKEL IPs (Supplementary Table 1).

      Rbx1: We searched for human Rbx1 orthologs (P62877) and identified TGME49_213690, which functionally resembles Rbx1 of the SCF complex (PMID: 31348812); as well as several other RING proteins (TGME49_267520, TGME49_277740, TGME49_261990, and TGME49_232160) that were not found in the SPARK or SPARKEL IPs (Supplementary File 1).

      Rbx2: We searched for human Rbx2 orthologs (Q9UBF6) and identified several RING proteins (TGME49_285190, TGME49_254700, TGME49_292340, TGME49_226740, TGME49_244610, and TGME49_304460) that were not found in the SPARK or SPARKEL IPs (Supplementary File 1). No T. gondii protein has an Rbx2 domain (cd16466) that can be identified by sequence searching.

      In conclusion, we conducted “direct IP analysis” (Figure 1A, 1D; Figure 1-supplement 1A) of the SPARK and SPARKEL complex in the first submission of the manuscript. The observation that SPARK and SPARKEL form strong interactions was validated in cellulo via proximity labeling (Figure 1E; Figure 1-supplement 1B) in the first submission of the manuscript. These results are described together in the results section SPARK complexes with an elongin-like protein, SPARKEL (lines 75-110, first submission of manuscript). The failure to identify an interaction between SPARKEL and Elongin B/C complex members in T. gondii may be due to the observation that Elongin B and several ELC complex members do not exist in most eukaryotes, including T. gondii. We added the sentences “The function of proteins with Elongin C-like domains has not been widely investigated in unicellular eukaryotes” to the Results and “However, the SPARK and SPARKEL IPs and proximity experiments failed to identify obvious components of ubiquitin ligase complexes” to the Discussion.

      (3) PKA and PKG half-lives should be measured as well as their transcript abundances.

      The finding that PKA C1 and PKG protein abundances decreased upon SPARK/SPARKEL depletion was internally consistent across experiments. This down-regulation may be due to transcriptional, translational, or post-translational mechanisms. We measured PKG and PKA C1 transcript abundances in SPARK-AID and TIR1 parasites after 24 hours of IAA treatment using RT-qPCR. We did not detect significant differences in transcript levels of the queried kinases. These findings suggest that SPARK depletion leads to PKG and PKA down-regulation through post-transcriptional mechanisms. Translational control is normally enacted globally, for example through regulation of eukaryotic translation factors (PMID: 15459663). The rapid and specific down-regulation of PKG and PKA C1 would suggest that the kinase abundance levels are regulated by non-global translational mechanisms (e.g. mRNA-specific) or rather post-translational mechanisms.

      Substantial additional work is required to determine protein half-lives in eukaryotic parasites. In our discussion of possible mechanisms and models, we were agnostic as to the cause of reduced PKG and PKA abundances upon SPARK depletion. We note in the discussion, “The cause for reduction of PKA C1 and PKG levels requires further study” (lines 541-542).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #2:

      (1) P-values should be reported adjusted for multiple tests or, at the very least, note that they are unadjusted to alert the reader that they may be biased by winner's curse.

      Throughout the manuscript, we applied the false discovery rate threshold to declare results that were statistically relevant for discussion. However, for reporting in abstract, we believe the raw p-values are most straightforward as we only reported the most important and robust results, and considering that 1) multiple testing correction does not change the ranking of the adjusted p-values; 2) p-value adjustment depends on both the method and the number of hypothesis tested; 3) all reporting of the most significant discovery results are prone to winner’s curse, but in the context of our study: the GFI1 finding was confirmatory in nature, thus raw p-value allows for a direct comparison with existing studies.

      We have taken the suggestion to quote the FDR-adjusted p-values throughout the manuscript for meta-analyzed results and discussed the impact of FDR correction for the EWAS and MRS association differed as a result of the number of hypothesis in each context:

      “For each EWAS or meta-analysis, the false discovery rate (FDR) adjustment was used to control multiple testing and we considered CpGs that passed an FDR-adjusted p-value < 0.05 to be relevant for maternal smoking.”

      “An FDR adjustment was used to control the multiple testing of meta-analyzed association between MRS and 25 (or 23, depending on the number of phenotypes available in the cohort) outcomes, and we considered association that passed an FDR-adjusted p-value < 0.05 to be relevant.”

      (2) The odds ratios and p-values reported in the abstract for associations of the MRS with smoking status and smoking exposure per week appear to be missing from the results section of the manuscript or (supplementary) tables.

      The results for smoking status during pregnancy was added to the results:

      “As a result, the epigenetic maternal smoking score was strongly associated with smoking status during pregnancy (OR=1.09 [1.07,1.10], p=5.5×10-33) in the combined European cohorts.”

      The exposure association was reported in the result section and Supplementary Table 8. We do note the typo in the cohort specific p-values, which now has been corrected.

      (3) It is misleading to report a lack of MRS associations with maternal smoking in South Asians without also stating that there were only two smokers.

      We agree with the reviewer that an association test would not be justified given the lack of smoking in the present South Asian cohort. We also removed the p-value of association for the START cohort in Figure 3, based on this and comment #4 from reviewer #3. The relevant results have been revised as follows:

      “The HM450 MRS was significantly associated with maternal smoking history in CHILD and FAMILY (n = 397), but we failed to meaningfully validate the association in START (n = 503; Figure 3) – not surprisingly – due to the low number of ever-smokers (n = 2).”

      (4) It is potentially confusing to report MRS associations with maternal smoking by ethnicity but then report associations with birth size and length combined without any explanation. The most novel result of this study is that there is virtually no maternal smoking among the South Asians and yet the MRS is associated with birth weight and size and with height at age 2. This result is buried in the combined analysis. I would suggest reporting the MRS associations with height and weight separately as has been done for maternal smoking behavior.

      We thank the reviewer for this suggestion and this has now been added the new Table 3, showing the cohort specific and meta-analyzed effect sizes. In the revision, we highlighted that the ethnic specific MRS associations, such as with smoking exposure at various age (1 and 3 years) and skinfold thickness in European cohorts but not the South Asian cohort, as well as associations that were more homogenous, such as the birth weight and unique body size association in combined cohorts. In particular, the MRS in the South Asian cohort exhibited a consistent association with body size at various time points (at birth, 1, 2, and 5 year) with similar effect sizes. The following was added to the results:

      “A higher maternal smoking MRS was significantly associated with smaller birth size (-0.37±0.12, p = 0.0023; Table 3) and height at 1, 2, and 5 year visits in the South Asian cohort (Table 3). We observed similar associations with body size in the white European cohorts (heterogeneity p-values> 0.2), collectively, the MRS was associated with a smaller birth size (-0.22±0.07, p=0.0016; FDR adjusted p = 0.019) in the combined European and South Asian cohorts (Table 3). Meanwhile, a higher maternal smoking MRS was also associated with a lower birth weight (-0.043±0.013, p = 0.001; FDR adjusted p = 0.011) in the combined sample, though the effect was weaker in START (-0.03±0.02; p = 0.094) as compared to the white European cohorts.

      The meta-analysis revealed no heterogeneity in the direction nor the effect size of associations for body size and weight between populations at birth or at later visits (heterogeneity p-values = 0.16–1; Supplementary Table 8).”

      Reviewer #3:

      (1) You mention that the 450K Score performs best even though only 10/143 are included for some populations. Did you explore recalibration of the MRS using only those 10 CpGs?

      We thank the reviewer for this comment – due to an error in result transferring, the number of overlapping CpGs between the 450K score and the targeted array was in fact 26. This error only impacted results relevant to the FAMILY study using the HM450K score and did not materially change our results nor conclusions. We have updated accordingly, Table 3, Suppl. Tables 5, 8, 9, Figure 3-B, and Suppl. Figures 5, 6-B), 7-B) and 7-D), and throughout the manuscript for meta-analyzed MRS associations.

      The subset of 26 CpGs using the originally derived weight was expected to perform worse than the original HM450K score using the full 143 CpGs. When we did restrict the methylation score construction to these 26 CpGs, the performance in CHILD was worse than the original score, but comparable to FAMILY (updated Suppl. Table 5). These 26 CpGs did overlap with the targeted score derived in CHILD (13 out of 15 present) and in FAMILY (19 out of 63 present), suggesting moderate agreement between the array platform as well as across studies.

      In other words, while the subset of 26 CpGs had reasonable performance in both CHILD and FAMILY, both studies could benefit by inclusion of the additional CpGs in the original score. We have included a sentence to discuss the choice of validation study and the trade-off between sample size and # of CpGs under response to Reviewer 3 comment # 2.

      (2) Could the internal validation performance be driven by sample size of the training, providing support for the need for larger training sizes? Should this be discussed in the study?

      The validation study, CHILD, has the smaller sample size between the two European cohorts. While both potential data for validation had smaller sample sizes, we chose CHILD (n=347), rather than FAMILY (n=397) as it had better coverage with respect to the discovery EWAS or the training data (# of associated CpGs = 3,092, n = 5,647). Beyond the signals of association, the validation performance also depends on a mix of overall sample size and the proportion of current smokers. Given the proportion of current smokers, the effective sample size for a direct comparison, i.e. equivalently-powdered sample size of a balanced (50% cases, 50% controls) design, are 41.7 and 104.7 for CHILD and FAMILY, respectively. While we are unable to directly compare whether a larger effective sample size produced a better performing score, we believe this to be the case, and thus a larger validation study would boost the performance of the methylation score. We have added the following to the discussion:

      “Given the proportion of current smokers, the effective sample size for a direct comparison between CHILD and FAMILY, i.e. equivalently-powdered sample size of a balanced (50% cases, 50% controls) design, were 41.7 and 104.7, respectively. While CHILD had a lower effective sample size, we ultimately chose it for validating the methylation score to better cover the CpGs that were significant in the discovery EWAS. A larger validation study will likely further boost the performance of the methylation score and be considered in future research.”

      (3) Figure 1: It is very helpful to have an overview diagram, but this should then follow the flow of the manuscript to aid the reader. Currently, the diagram does not follow the flow of the manuscript and thus is rather confusing - for instance, the figure starts with the MRS but initially an EWAS is conducted in the manuscript itself. I suggest to adapt the overview figure accordingly. Moreover, a description for (A), (B), (C) is not provided in the figure legends. Figure 1 could thus be improved further.

      We thank the reviewer for the suggestion to improve the key figure that summarizes the manuscript. The EWAS workflow for the primary, secondary and tertiary outcomes, as well as the European cohorts meta-analysis has been added to the updated sub-figure A). The description for each subfigures has also been added to the figure legends as follows:

      “Figure 1-A) shows the epigenome-wide association studies conducted in the European cohorts (CHILD and FAMILY); Figure 1-B) illustrated the workflow for methylation risk score (MRS) construction using an external EWAS (Joubert et al., 2016) as the discovery sample and CHILD study as the external validation study, while Figure 1-C) demonstrates the evaluation of the MRS in two independent cohorts of white European (i.e. FAMILY) and South Asian (i.e. START). The validated MRS was then tested for association with smoking specific, maternal, and children phenotypes in CHILD, FAMILY, and START, as shown in Figure 1-D).”

      (4) Figure 3: The readability and information content in this figure, and other figures containing boxplots (e.g., Supplementary Figure 5), could be improved. I would suggest to justify X axis labels to the axis rather than overlapping, and importantly, show individual data points wherever possible (e.g., overlaying the box plots). In c), the ANOVA is not justified given the sample size in START. In general, it is worth excluding the START cohorts from this analysis on the justification of a too small sample size for maternal smokers.

      We thank the reviewer for their thoughtful points for improvement. The axis labels have been wrapped to avoid overlapping, and the data points added to the boxplots. ANOVA p-value for START was removed due to the low counts of smokers in the figure and manuscript throughout. However, we retained START in Figure 3 and other boxplots to show the distribution of the score for non-smokers to benchmark with the European cohorts.

      (5) In addition to boxplots, it may be helpful to show AUC diagrams for ROC curves (e.g. Figure 3). AUCs are reported in the Tables but not shown. Additionally, all AUC results should include 95% Confidence intervals.

      This is a great suggestion and we have added the corresponding ROC, annotated with AUC (95% CI) to Figure 3. The 95% CI for all AUC results were added to the Tables and main text. The following was added to Methods:

      “The reported 95% confidence interval for each estimated AUC was derived using 2,000 bootstrap samples.”

      (6) Supplementary Figure 6: It could be helpful to discuss the amount of overlap between the different MRS.

      Most of the scores were derived using the Joubert et al., (2016) EWAS as the discovery sample, including ours, and thus there will be overlap between the scores. The exception was the GondaliaScore, which contained only 3 CpGs that do not overlap with any other scores.

      While different scores might not have selected completely identical sets of CpGs, the mapped genes are highly consistent across the scores. We have added to the discussion and results the extent of overlap between the top scores:

      “In particular, scores that were derived using the Joubert EWAS as the discovery sample, including ours, had higher pairwise correlation coefficients across the birth cohorts, with many of the CpGs mapping to the same genes, such as AHRR, MYO1G, GFI1, CYP1A1, and RUNX3.”

      (7) Supplementary Figure 7: This figure is never referenced in the text and from the legend itself it is not too clear what it is trying to show. Please refer to it in the main text with some additional context.

      Supplementary Figure 7 was referenced in the Results under subsection “Methylation Risk Score (MRS) Captures Maternal Smoking and Smoking Exposure”, following the<br /> Methods subsection “Statistical analysis” where we wanted to examine a systematic difference. We made revision to the main text to clarify the analysis:

      “For the derived MRS, we empirically assessed whether a systematic difference existed in the resulting score with respect to all other derived scores. This was examined via pairwise mean differences between the HM450 and other score using a two-sample t-test and an overall test of mean difference using an ANOVA F-test, among all samples and the subset of never smokers.”

      (8)   Tables: Tables are currently challenging to read and perhaps more formatting could be done to improve readability.

      We thank the reviewer for the suggestion. Main tables have been reformatted to a landscape layout and each numeric cell moved to the centre to improve readability.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      (1) In Figure 1, the authors show that TF3C binds to the amino terminus of MYCN (Myc box I region), as shown previously. The data in Figure 1 B-D support, but do not rigorously confirm a 'direct' interaction because it has not been ruled out that accessory proteins mediating the association may be present in the mixture.

      In Figure 1B-D we have purified MYCN and the TFIIIC/TauA complex separately and then mixed the purified preparations, demonstrating that the purified proteins interact. We have additionally performed mass spectrometry, which shows that the TauA/MYCN complex is formed without further accessory proteins, as the molecular weight would be higher. Based on the Coomassie stained SDS-PAGE gels, there is no plausible contaminating band in the purified complex that could be mediating the interaction between MYCN and TauA, either in the purified complex (Figure 1C), or in the purified protein used to reconstitute the complex (Figure S1A & S1B).

      (2) The authors indicate in Figure 2 that TF3C has essentially no effect on MYCNdependent gene expression and/or transcription elongation. Yet a previous study (PMID: 29262328) associated with several of the same authors concluded that TF3C positively affects transcription elongation. The authors make no attempt to reconcile these disparate results and need to clarify this point.

      We agree that the data in this manuscript do not support the role on transcription elongation. This point was also raised by Reviewer 3. Comparing our new results to the data published previously we can summarize that the data sets in the two studies show three key results: First, the traveling ratio of RNAPII changes upon induction of MYCN. Second, RNAPII decreases at the transcription start side and third, it increases towards the end side.

      We agree that in the previous study we linked the traveling ratio directly to elongation. However performing ChIP-seq with different RNAPII antibodies showed us that for example RNAPII (N20), which is unfortunately discontinued, gives different results compared to RNAPII (A10). Combining our new results using the RNAPII (8WG16) antibody shows that the traveling ratio is not only reflecting transcription elongation but also includes that the RNAPII is kicked-off chromatin at the start side.

      (3) Figures 2B and C show that unphosphorylated pol2 is TSS-centered, and Ser2-P pol2 occupation is centered beyond the TES. From this data, however, the reader can't tell how much of the phospho-Ser2- pol2 is centered on the TSS. The authors should include overall plots over TSS and TES, and also perhaps the gene-body to allow a better comparison for TSS and TES plotted for both antibodies over the collected gene sets.

      We focused on the TSS for unphosphorylated RNAPII and the TES for pSer2-RNAPII, as these are the regions with specific enrichment of the respective antibodies. As requested for comparison, we now include metagenes showing TSS, gene-body, and TES for both antibodies as new Figure S2A and B. Additionally, we included density plots for unphosphorylated RNAPII at the TES as well as for pSer2-RNAPII at the TSS as a Figure for the Reviewers (Figure 1).

      (4) The authors see more TF3C at promoters in cells with MYCN (Figure 2F). What are the levels of TF3C in the absence and presence of MYCN?

      As shown in the immunoblot in Figure S1E, TF3C5 levels do not change upon induction of MYCN. We therefore think that MYCN helps to recruit TFIIIC5 to RNAPII promoter sites. This is also in accordance to what we previously reported 1.

      (5) The finding that TF3C is increased at TSS (Figure 2F) doesn't necessarily indicate that 1) MYCN is recruiting TF3C there, and 2) that this is due to the phosphorylation status of pol2. It could mean many other things. The logic of conflating these 3 points based on the data shown is questionable.

      We showed previously that knock-down of MYCN affects TFIIIC5 binding, showing that MYCN is required for binding of TFIIIC5 at promoter sites 1.

      Additionally, we included data with DRB treated cells (Figure 2F), which prevents RNAPII loading by preventing downstream de novo elongation. Those data show that TFIIIC5 binding at the TSS is massively increased upon induction of MYCN and additionally upon treatment with DRB. Conversely, we observed that the major effect of TFIIIC knock-down was at the nonphosphorylated RNAPII at the TSS on MYCN induction (Figure 2B). Therefore, we would argue that our assumption fits well to the data presented in the manuscript.

      (6) Figure 3A doesn't add much to the paper, as it is overplotted and no relationship is clear, except that Pol2 and MYCN occupy many of the same sites. Perhaps a less complex or different type of plot would allow the interactions to be better visible.

      We agree with the comment and since in another comment we were asked to show the same window for all shown Hi-ChIP data plots, we changed Figure 3A.

      (7) That depletion of TF3C leads to increased promoter hubs may or may not have anything to do with its association with MYCN (Figure 4E). This could be a direct consequence of its known structural function in cohesin complexes, and the MYCN changes as a secondary consequence of this (also see point 4, above).

      As shown in Büchel et al. (2017) 1 MYCN is needed to recruit RAD21 and depletion of RAD21 has no impact on the recruitment of MYCN. Since RAD21 is part of the cohesin complex we would exclude that the MYCN changes are a secondary consequence.

      (8) Depletion of TF3C5 results in a loss of EXOSC5 (exosome) at TSS in the presence and absence of MYCN (Figure 5B). As TF3C5 is a cohesin, could this simply be a consequence of genomic structure changes?

      We agree that the discovered changes in EXOSC5 can be due to depletion of TFIIIC5. TFIIIC has been shown to recruit cohesin 1 and condensin complexes 2, as well as inducing chromatin architectural changes 3. However, MYCN is needed to recruit TFIIIC and depletion of TFIIIC had no impact on MYCN recruitment 1. Furthermore, MYCN has been shown to recruit exosome 4. Therefore, we would argue that either MYCN can directly play a role or thru chromatin architectural changes.

      (9) The authors suggest that RNA dynamics are affected by changes in exosome function (RNA degradation, etc). What effect, if any does TF3C depletion have on the overall gene expression profile?

      We show in the manuscript that TFIIIC depletion in unperturbed cells has no effect on the global gene expression profile in the time frame analyzed (Figure 2E and S2B).

      Reviewer #2 (Public Review):

      (1) Dynamic inferences are made without kinetic experiments.

      While we agree that we did not collect kinetic data to study the dynamics of RNA polymerase we would argue that the integration of our different data sets make it possible to draw conclusions about dynamic interferences. The transcription cycle and its sequential steps have been well described. In this sense, we use the non-phosphorylated RNAPII data that is situated between RNAPII recruitment and initiation and RNAPII-pSer2 that shows pause-release to elongation to draw conclusions on the dynamic. Likewise, we also made use of our previous published datasets.

      Reviewer #2 (Recommendations For The Authors):  

      (1) A number of changes are reported in hub size, expression, etc. upon treatment with tamoxifen to activate MCN-ER. But MYC is already present in the SHEP cells, so why doesn't MYC support these same phenomena? It would seem that either the ability to cooperate with TFIIIC to clear non-productive polymerase complexes from promoters is particular to MYCN, or else it reflects a quantitative increase in total MYC proteins due to the entry of MYCN-ER into the nucleus with tamoxifen. The authors should address or discuss this issue.

      It could be that protein levels are the limiting factor between MYC and MYCN observed effects in this system. This interpretation would be in accordance with the results of Lorenzin et al. 5, which reported that different levels of MYC had different targets based on the affinity to Eboxes and protein level. A similar profile of MYC levels compared to function was also reported regarding SPT5 6. Those high protein levels mimic what is found in certain tumors in contrast to physiological levels. In this sense, the observed differences can also be between physiological and oncological levels of MYC proteins.

      On the other hand, it has been described both a core MYC- and an isoform specific-signature of target genes. MYCN is described to be involved in gene expression during the S-phase of the cell cycle 7. This suggests that there are differences between MYC and MYCN other than gene sets. The interaction with TFIIIC appears to be one of these differences. We have found multiple TFIIIC subunits as part of the MYCN interactome, but the interaction of TFIIIC with MYC is weaker and we are uncertain how relevant it is 7,8. We show here that depletion of different subunits of the TFIIIC complex show a MYCN-dependent growth defect (Figure 1 E). Similarly, nuclear exosome is a MYCN-specific dependence 4, and we show here that MYCNdependent recruitment of the exosome requires TFIIIC5. We take this as an indication that there is an intrinsic difference between MYC and MYCN and that MYCN engages TFIIIC for this pathway.

      (2) Reciprocal to TFIIIC recruitment to MYCN- rRNA, and other RNAPIII genes. Does this happen targets would be MYCN association with tRNA genes, 5S, and if so, is this association TFIIIC dependent? What happens to the expression of these genes?

      We did observe MYCN in interactions involving tRNA and other RNAPIII sites, such as SINE elements and tRNAs (Figure 4B, 4D, S3F, and S4B). There was no relevant number of 5S rRNA involved in interactions – either because the difficulty to properly map these repetitive regions or due to biology. In any case, none of those regions appeared to be specifically dependent on TFIIIC as the overall number of interactions increased in TFIIIC depletion regardless of the genomic annotation (Figure S4B). Regarding the expression of RNAPIII genes, we are constrained by technical limitations of poly(A) enrichment RNA-seq to globally analyze it in an unbiased way. However, we addressed this point for tRNAs expression in an earlier work 1 and found that tRNA levels do not change upon TFIIIC depletion. We think this is because tRNAs are stable transcripts and RNAPIII recycling can occur in a TFIIICindependent manner 9. Conversely, we reported no significant expression changes in RNAPII genes upon TFIIIC depletion in this work.

      (3) The authors show that TFIIIC depletion does not alter the RNA-expression profile; how do they account for this? Can they comment on "background" transcription that it would seem should be suppressed by TFIIIC-dependent removal of various hypofunctional polymerases?

      Since TFIIIC is important for the removal of non-functional RNAPII we would not expect changes to the gene expression profile upon depletion of TFIIIC in the time frame analyzed. Monitoring the elongating form of RNAPII by measuring pSer2 indeed shows us that transcription elongation is not affected.

      (4) Global changes in expression are difficult to assess with DESEQ2. This hypernormalizing algorithm is not really suited to distinguish differential, but universal upregulation from some targets being truly upregulated while others are downregulated. The authors should comment.

      The authors acknowledge that DESEQ2 relies on the conjecture that genewise estimates of dispersion are generally unchanged among samples. We address this comment in two different ways. We include those in the Figure for the Reviewers (Figure 2). The first was to sequence samples deeper to avoid any bias created by random effect of lower coverage, the range of total reads increased from 6.8-9.3 to 16.5-20.7 million reads. The second was to compare the fold average bin dot plot for RNA-seq of SH-EP-MYCN-ER showing mRNA expression normalized by control per bin using the DESEQ2 (Figure 2A) normalization to TMM in edgeR (Figure 2B) and to quantile normalization (Figure 2C). No major differences were found from the original data or using the different methods, but we updated the Figure 2E in the manuscript to include the deeper sequencing dataset, we also adjusted it to show -/+ MYCN and transformed to log2 to make it more intuitive. Overall, it enhances our original understanding that gene expression remains largely unaffected by TFIIIC5 knockdown.

      (5) On page 7, the authors claim that MYCN-ER increased Ser-2 can reflect MYCN-stimulated transcription elongation. In fact, without kinetic studies, this is not fully supported. Accumulation of Ser-2 RNAPII along a gene can reflect increased initiation of full-speed RNAPs or a pile-up of RNAPs slowing down. This should be resolved or qualified.

      While we agree that we did not collect kinetic data to study the dynamics of RNA polymerase we would argue that the integration of our different data sets make it possible to draw conclusions about dynamic interferences. We showed on the one side that pSer-2 accumulates on the TES and on the other side the induction of MYCN-ER up-regulates gene expression which proves productive transcription elongation.

      (6) pLHiChIP needs to be better described, the Mumbach reference is not sufficient.

      We have reformulated the pLHiChIP in the method section and hope that this will provide now a better description of the method.

      (7) Can the authors recheck all the labels in Figure 2D-I believe there is an error involving + or - MYCN.

      We carefully rechecked all the labels in Figure 2 and it was correct as it was. We understand the confusion that may have created comparing Figure 2D and Figure 2E. To avoid confusion, we updated Figure 2E to show the same direction of Figure 2D. We also log2 transformed the y-axis of Figure 2E to foster a more intuitive reading.

      (8) Why are there different scales for the regions of chromosome 17 shown in Figures 3 and 4? It would be easier to compare if the examples were all shown at the same scale (about 2 MB is shown in another Figure).

      We now show the same region of chromosome 17 in Figure 3 and 4.

      Reviewer #3 (Public Review):

      (1) The connection between the three major findings presented in this study regarding the role of TFIIIC in the regulation of MYCN function remains unclear. Specifically, how the TFIIICdependent restriction of MYCN localization to promoter hubs enhances the association of factors involved in nascent RNA degradation to prevent the accumulation of inactive RNA polymerase II at promoters is not apparent. As they are currently presented, these findings appear as independent observations. Cross-comparison of the different datasets obtained may provide some insight into addressing this question.

      We previously observed that TFIIIC does not affect MYCN recruitment, while MYCN affects TFIIIC binding 1. Moreover, our group reported that MYCN recruits exosome 4 and BRCA1 to promoter-proximal regions 10 to clear out non-functional RNAPII. We are currently reporting that MYCN-TFIIIC complexes exclude non-functional RNAPII. However, MYCN-active promoter hubs have more RNAPII and more transcription than MYCN-active promoter outside hubs. Furthermore, TFIIIC binding occurs upstream of BRCA1 and exosome recruitments as depletion of TFIIIC leads to recruitment decrease of both factors. Therefore, we argue that TFIIIC is required for the proper function of those MYCN-active promoter hubs.

      (2) Another concern involves the disparities in RNA polymerase II ChIP-seq results between this study and earlier ones conducted by the same group. In Figure 2, the authors demonstrate that activation of MYCN results in a reduction of non-phosphorylated RNA polymerase II across all expressed genes. This discovery contradicts prior findings obtained using the same methodology, where it was concluded that the expression of MYCN had no significant effect on the chromatin association of hypo-phosphorylated RNA polymerase II (Buchel et al, 2017). In this regard, the choice of the 8WG16 antibody raises concern, as fluctuations in the signal may be attributed to changes in the phosphorylation levels of the Cterminal domain. It remains unclear why the authors decided against using antibodies targeting the N-terminal domain of RNA polymerase II, which are unaffected by phosphorylation and consistently demonstrated a significant signal reduction upon MYCN activation in their previous studies (Buchel et al, 2017) (Herold et al, 2019). Similarly, the authors previously proposed that depletion of TFIIIC5 abrogates the MYCN-dependent increase of Ser2phosphorylated RNA polymerase II (Buchel et al, 2017), whereas they now show that it has no obvious impact. These aspects need clarification.

      We politely disagree that our discoveries are contradicting each other. Comparing our new results to the data published previously we can summarize that the data sets in the two studies show three key results: First, the traveling ratio of RNAPII changes upon induction of MYCN. Second, RNAPII decreases at the transcription start side and third, it increases towards the end side.

      We agree that in the previous study we linked the traveling ratio directly to elongation. However performing ChIP-seq with different RNAPII antibodies showed us that for example RNAPII (N20), which is unfortunately discontinued, gives different results compared to RNAPII (A10). Combining our new results using the RNAPII (8WG16) antibody shows that the traveling ratio is not only reflecting transcription elongation but also includes that the RNAPII is kicked-off chromatin at the start side.

      In the previous study we only performed manual ChIP experiments for RNAPII (8WG16) and pSer2. Now we did a global analysis which is more meaningful and is also reflected in the RNA sequencing data.

      (3) Finally, the varied techniques employed to explore the role of TFIIIC in MYCNdependent recruitment of nascent RNA degradation factors make it challenging to draw definitive conclusions about which factor is affected and which one is not. While conducting ChIPseq experiments for all factors may be beyond the scope of this manuscript, incorporating proximity ligation assays (PLA) or ChIP-qPCR assays with each factor would have enabled a more direct and comprehensive comparison.

      We understand the criticism that we are comparing different assays. We have performed PLAs with different antibodies. Since the controls of the PLAs were not sufficient for us, we refrain from using them. ChIP-qPCR experiments are much more challenging to do side by side compared to PLAs, which is why we decided against looking at all factors with this method.

      Recommendations For The Authors:

      Reviewer #3 (Recommendations For The Authors):

      (1) Figure 2: Why did the authors choose the 8WG16 antibody? Does TFIIIC5 depletion suppress the MYCN-dependent reduction of total RNA polymerase II binding to promoters that they consistently showed in previous studies? Given that phosphorylation of the CTD impacts 8WG16 recognition, including Ser5-phosphorylated RNA polymerase II ChIPseq experiments might clarify this issue.

      We used the RNAPII (8WG16) antibody to exactly map non-phosphorylated RNAPII which shows us the binding of non-functional RNAPII.

      (2) Figures 3 and 4: As it stands, the manuscript does not convincingly establish a functional connection between the results in Figures 2, 3, and 4 or elucidate potential mechanisms. Are changes in RNA polymerase II levels upon MYCN activation more pronounced at promoters located at MYCN hubs? Do changes in MYCN-enriched chromatin contacts upon TFIIIC5 depletion somehow correlate with alterations in RNA polymerase II levels? Performing similar cross-comparisons as in Figure 3C may help address this issue. Furthermore, it not clear how the authors concluded that MYCN/TFIIIC5-bound genes are not part of these so-called promoter hubs.

      In Figure 3C we show that RNAPII levels are more pronounced upon MYCN activation at promoters located at MYCN hubs. Additionally, we show non-phosphorylated ChIP-seq on TSS and RNAPII-pSer2 ChIP-seq on TES density plots for promoters with MYCN interactions in the Figure for the Reviewers (Figure 3). We found no other difference than binding compared to the overall global analysis for all expressed genes showed in Figure 2B and Figure 2C. This goes on the same direction of the high expression observed of those genes in MYCN interactions observed in Figure 3C.

      The changes observed in Figures 2B and 2C are global and do include the promoters with MYCN interactions. At the same time, it is required a higher number of replicates to statistically distinguish the MYCN interaction differences between TFIIIC5 presence and depletion. We acknowledge this limitation, and we therefore restrain any attempt towards this end. We base our conclusions on the other parts of the manuscript and on our previous studies that show that MYCN recruits TFIIIC, BRCA1, and the exosome to promoter proximal regions 1,4,10.

      (3) Figure 5: According to the PLA results, activation of MYCN could enhance RNA polymerase II-NELFE interaction in a TFIIC5-dependent manner. Considering the raised issues regarding the use of the 8WG16 antibody, this result might be of relevance.

      Nevertheless, PLA does not seem to be the optimal technique to address these questions, and I would rather suggest performing ChIP-qPCR experiments for all the factors to be compared. Finally, do the authors conclude that the TFIIIC5 effect on MYCN-dependent changes in RNA polymerase II depends upon the recruitment of EXOSC5 and BRCA1? If so, it would be interesting to determine whether depletion of these factors phenocopies the effects observed with TFIIC5.

      We understand the criticism that we are comparing different assays. We have performed PLAs with different antibodies. Since the controls of the PLAs were not sufficient for us, we refrain from using them.

      (4) In Figure S2 the labels should be EtOH, 4-OHT, and Input.

      We changed this accordingly.

      (5) On page 7, the sentence "We have shown previously that TFIIIC5 depletion does not cause significant changes in expression of multiple tRNA genes that are transcribed by RNAPIII (Buchel et al., 2017)" appears to lack a connection.

      We agree with the reviewer and we deleted this sentence from the manuscript.

      Author response image 1.

      (A) Density plot of ChIP-Rx signal for non-phosphorylated RNAPII. Data show mean (line) ± standard error of the mean (SEM indicated by the shade) of different gene sets based on an RNA-seq of SH-EP-MYCN-ER cells ± 4-OHT. The y-axis shows the number of spike-in normalized reads and it is centered to the TES ± 2 kb. N = number of genes in the gene set defined in the methods. (B) Density plot of ChIP-Rx signal for RNAPII pSer2 as described for panel A. The signal is centered to the TSS ± 2 kb.

      Author response image 2.

      Bin dot plot for RNA-seq of SH-EP-MYCN-ER showing mRNA expression normalized by control per bin comparing the fold average using DESEQ2 (A), normalization to TMM in edgeR (B) and to quantile normalization (C).

      Author response image 3.

      Average density plot of ChIP-Rx signal for non-phosphorylated RNAPII (A) or RNAPII pSer2 (B) at promoters with MYCN interactions.

      References

      (1) Büchel, G., Carstensen, A., Mak, K.-Y., Roeschert, I., Leen, E., Sumara, O., Hofstetter, J., Herold, S., Kalb, J., and Baluapuri, A. (2017). Association with Aurora-A controls NMYC-dependent promoter escape and pause release of RNA polymerase II during the cell cycle. Cell reports 21, 3483-3497.

      (2) Yuen, K.C., Slaughter, B.D., and Gerton, J.L. (2017). Condensin II is anchored by TFIIIC and H3K4me3 in the mammalian genome and supports the expression of active dense gene clusters. Sci Adv 3, e1700191. 10.1126/sciadv.1700191.

      (3) Ferrari, R., de Llobet Cucalon, L.I., Di Vona, C., Le Dilly, F., Vidal, E., Lioutas, A., Oliete, J.Q., Jochem, L., Cutts, E., Dieci, G., et al. (2020). TFIIIC Binding to Alu Elements Controls Gene Expression via Chromatin Looping and Histone Acetylation. Mol Cell 77, 475-487 e411. 10.1016/j.molcel.2019.10.020.

      (4) Papadopoulos, D., Solvie, D., Baluapuri, A., Endres, T., Ha, S.A., Herold, S., Kalb, J., Giansanti, C., Schulein-Volk, C., Ade, C.P., et al. (2021). MYCN recruits the nuclear exosome complex to RNA polymerase II to prevent transcription-replication conflicts. Mol Cell. 10.1016/j.molcel.2021.11.002.

      (5) Lorenzin, F., Benary, U., Baluapuri, A., Walz, S., Jung, L.A., von Eyss, B., Kisker, C., Wolf, J., Eilers, M., and Wolf, E. (2016). Different promoter affinities account for specificity in MYC-dependent gene regulation. Elife 5. 10.7554/eLife.15161.

      (6) Baluapuri, A., Hofstetter, J., Dudvarski Stankovic, N., Endres, T., Bhandare, P., Vos, S.M., Adhikari, B., Schwarz, J.D., Narain, A., Vogt, M., et al. (2019). MYC Recruits SPT5 to RNA Polymerase II to Promote Processive Transcription Elongation. Mol Cell 74, 674-687 e611. 10.1016/j.molcel.2019.02.031.

      (7) Baluapuri, A., Wolf, E., and Eilers, M. (2020). Target gene-independent functions of MYC oncoproteins. Nat Rev Mol Cell Biol. 10.1038/s41580-020-0215-2.

      (8) Koch, H.B., Zhang, R., Verdoodt, B., Bailey, A., Zhang, C.D., Yates, J.R., 3rd, Menssen, A., and Hermeking, H. (2007). Large-scale identification of c-MYCassociated proteins using a combined TAP/MudPIT approach. Cell Cycle 6, 205-217. 10.4161/cc.6.2.3742.

      (9) Ferrari, R., Rivetti, C., Acker, J., and Dieci, G. (2004). Distinct roles of transcription factors TFIIIB and TFIIIC in RNA polymerase III transcription reinitiation. Proc Natl Acad Sci U S A 101, 13442-13447. 10.1073/pnas.0403851101.

      (10) Herold, S., Kalb, J., Büchel, G., Ade, C.P., Baluapuri, A., Xu, J., Koster, J., Solvie, D., Carstensen, A., and Klotz, C. (2019). Recruitment of BRCA1 limits MYCN-driven accumulation of stalled RNA polymerase. Nature 567, 545-549.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Data on SSCs are published from a previous report (Fig. 1C). These should be deleted or marked as such.

      We acknowledge the need for clarification regarding our study population for the germ cell stainings. As stated in our Materials and Methods section, our current study population includes the cohort from our previous publication (Vereecke et al., 2020), supplemented by nine additional participants, totaling n=106 trans women. Fig. 1C incorporates both previous and new data on germ cells, and this was further clarified in the Materials and Methods section.

      (2) Many micrographs are suboptimal and need to be replaced by better photos presenting cellular details more clearly. 

      The Figures were remade to solve the suboptimal resolution.

      (3) Table 2 would benefit from a column indicating the target cell or organelle.

      This column was added to Table 2.

      (4) The pubertal status is poorly defined by pre- and peripubertal terms. The authors should add more informative clinical scores. 

      We included information on the Tanner stages of the trans women in our cohort (all G5), as well as details on the selection criteria for our controls and their pubertal status.

      (5) The characterization of Leydig cells is incomplete. Several better markers would validate the findings. 

      As briefly touched upon in the discussion, the marker delta-like homolog 1 would indeed be valuable to assess the presence of truly immature Leydig cells. Unfortunately, our attempts to optimize the immunofluorescence protocol for this marker were unsuccessful, resulting in a double staining instead of a triple staining for the Leydig cells. This statement was also added to the Discussion.  

      (6) The selection bias for datasets is obvious. It seems that the authors try to create nice stories but do not always refer to less compelling datasets. Here a more critical view may be necessary to gain a more realistic view and may open alternative explanations. 

      We would appreciate clarification on which datasets may have been insufficiently reviewed and how our selection of highlights may have introduced bias to the interpretation and conclusion of the study. It is important to note that we did not select any patients/ data; all patient data were incorporated into our results section.

      (7) The term rejuvenation for the stem cell niche/germ cell complement is misleading in the title and text. Could the authors consider another team e.g... restoration., (de)differentiation. Alternatively, define the term juvenation in a more substantial manner. 

      We did not change the term “partial rejuvenation” as we believe it best describes our findings. We did however introduce the term in a more substantial manner in our Abstract and Discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors provided a lot of scattered data, but it would be useful to formulate clear criteria (hormonal therapy, age, end points, etc.) that the material must meet so that it can be used for research into prepubertal processes. 

      We have added these criteria to our Discussion. However, our current results do not yet reveal how these tissues behave in vitro. Ongoing research is addressing this question and will be presented in a future paper.

      (2) Is there any research on the preservation of functions of testicular cells from trans women?

      This data would be very useful, for example, for models for drug testing.  Yes: a reference to this paper was added to our Discussion.

      (3) It is recommended to present the data in a table reflecting the correlations found by the authors and the correlations from the literature between cellular changes and hormone levels and age. 

      After careful consideration, we have decided to proceed without incorporating these suggested changes. Our paper focuses on original findings rather than synthesizing existing literature. As such, we have chosen to emphasize our novel results and to compare them to the existing literature in the discussion section.

      (4) The authors can also provide data on clinical standards for hormone levels depending on gender and age. 

      This was added as Supplementary Tables 1-6.

      (5) It is recommended to add links to sources from which information about cellular prepubertal, pubertal and adult markers was taken. 

      This information was added throughout the manuscript.  

      (6) Is it known which cells within the wall of the seminiferous tubules in adults express AMH? Please clarify. 

      It has been shown that AMH receptor type 2 starts to be expressed in peritubular mesenchymal cells within the tubular walls during puberty and it remains so throughout adulthood (Sansone et al., 2020). AMH bound to this receptor may help explain the observed AMH signal in the tubular wall of peripubertal and adult controls. This information was added to our Discussion.

      (7) How was the degree of hyalinization assessed? It's not obvious from the pictures.

      This was further clarified in the Materials & Methods section.

      (8) Why were inhibin B and AMH not measured in all patients? 

      Inhibin B and AMH levels were not available for all patients due to the retrospective nature of these analyses. The measurements were not consistently recorded for all individuals within the historical dataset upon which our research relies.

      (9) Why does picture 3A present few SOX9 on adult Sertoli cells, although this is their typical marker?

      SOX9 was present in the adult Sertoli cells. However, this signal appears to be more "diluted" in adults due to their ongoing spermatogenesis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Ewing sarcoma is an aggressive pediatric cancer driven by the EWS-FLI oncogene. Ewing sarcoma cells are addicted to this chimeric transcription factor, which represents a strong therapeutic vulnerability. Unfortunately, targeting EWS-FLI has proven to be very difficult, and a better understanding of how this chimeric transcription factor works is critical to achieving this goal. Towards this perspective, the group had previously identified a DBD-𝛼𝛼4 helix (DBD) in FLI that appears to be necessary to mediate EWS-FLI transcriptomic activity. Here, the authors used multi-omic approaches, including CUT&tag, RNAseq, and MicroC to investigate the impact of this DBD domain. Importantly, these experiments were performed in the A673 Ewing sarcoma model where endogenous EWS-FLI was silenced, and EWS-FLI-DBD proficient or deficient isoforms were re-expressed (isogenic context). They found that the DBD domain is key to mediating EWS-FLI cis activity (at msat) and to generating the formation of specific TADs. Furthermore, cells expressing DBD-deficient EWS-FLI display very poor colony-forming capacity, highlighting that targeting this domain may lead to therapeutic perspectives.

      We thank Reviewer 1 for their strong summary of Ewing sarcoma background and accurate description of our experimental approaches and findings.

      Strengths:

      The group has strong expertise in Ewing sarcoma genetics and epigenetics and also in using and analyzing this model (Theisen et al., 2019; Boone et al., 2021; Showpnil et al., 2022).

      We thank the reviewer.  

      They aim at better understanding how EWS-FLI mediated its oncogenic activity, which is critical to eventually identifying novel therapies against this aggressive cancer.

      We are happy to see that our overall aim was also appreciated by Reviewer 1.

      They use the most recent state-of-the-art omics methods to investigate transcriptome, epigenetics, and genome conformation methods. In particular, Micro-C enables achieving up to 1kb resolved 3D chromatin structures, making it possible to investigate a large number of TADs and sub-TADs structures where EWS-FLI1 mediates its oncogenic activity.

      We thank Reviewer 1 for their acknowledgement of our approaches and the resolution achieved with our Micro-C experiments.  

      They performed all their experiments in an Ewing sarcoma genetic background (A673 cells) which circumvents bias from previously reported approaches when working in non-orthologous cell models using similar approaches.

      We agree with the reviewer about the importance of using model systems that accurately capture features of the disease being studied. As we have added an additional cell line in the revision we should note that this second model also represents a Ewing sarcoma genetic background while representing tumors expressing another oncogenic fusion found in this disease. 

      Weaknesses:

      The main weakness comes from the poor reproducibility of Micro-C data . Indeed, it appears that the distances/clustering observed between replicates are typically similar or even larger than between biological conditions. For instance, in Figure 1B, I do not see any clustering when considering DBD1, DBD2, DBD+1, DBD+2.

      Lanes 80-83: "KD replicates clustered together with DBD replicate 1 on both axes and with DBD replicate 2 on the y-axis. DBD+ replicates, on the other hand, clustered away from both KD and DBD replicates. These observations suggest that the global chromatin structure of DBD replicates is more similar to KD than DBD+ replicates."

      When replacing DBD replicate 1 with DBD replicate 2, their statement would not be true anymore.

      Additional replicates to clarify this aspect seem absolutely necessary since those data are paving the way for the entire manuscript.

      These are valid concerns and we thank the reviewers for highlighting this limitation of poor clustering of Micro-C replicates on MDS plot. We account for this variability between different replicates when identifying differentially interacting regions. By using an adjusted p-value < 0.05, we aim to ensure that repeating the experiments we will discover the same differentially interacting regions with a false discovery rate of 5%.

      We also would like to note that the replicates cluster much closely on PCA plot of RNA-seq data (Supplementary Figure 1C) and as well as on PCA plot of H3K27ac CUT&Tag data (Figure 4A). Notably, the RNA-seq result has now reproduced when performed with different sets of hands across multiple studies (Boone, et. al., 2021 and this report), as well as in a second cell line (as reported in this manuscript revision). These observations suggest that the cells of these replicates are functionally similar to each other at a population level. Chromatin organization detected by Micro-C is a highly heterogenous within cells of a population (Misteli, et. al., 2020). Moreover, despite increased resolution with Micro-C over Hi-C, the conventional sequencing depth that Micro-C is performed at makes resolving finer scale 3D interactions, particularly between enhancers and promoters, challenging (Goel, et. al., 2023). Thus biologically relevant interactions driving EWSR1::ETS transcriptional regulation through de novo enhancers may have relatively weak signal in Micro-C. Both the strength of the signal and the heterogeneous chromatin state present in bulk samples could affect the average signal leading to poor clustering replicates (Hafner and Boettiger, 2022). 

      Importantly, rather than add an additional replicate of a single cell line, we repeated our study in an additional cell line, TTC466, and largely reproduced our high-level findings for transcription, enhancer formation, and 3D chromatin. Specific limitations of the TTC466 study are addressed in the Discussion section (392-420). The reproduction of weak/moderate clustering in the MDS plot in both A673 and TTC466 cell lines suggests the α4 helix of EWSR1::ETS fusions are important for reshaping 3D chromatin. However, higher resolution analyses focused on specific EWSR1::ETS-bound loci are likely an important area of future study required to fully understand the role of the α4 helix in chromatin regulation in Ewing sarcoma.

      Similarly:

      - In Figure 1C, how would the result look when comparing DBD2/KD2/DBD+2? Same when comparing DBD 1 with KD1 and DBD+1. Would the difference go in the same direction?

      This is a great point. We added distance decay plots of individual replicates in Supplementary Figure 2 and added discussion of these results in lines 88-89 of the text.

      - Figure 1D-E. How would these plots look like when comparing each replicate to each other's? How much difference would be observed when comparing, for instance, DBD1/DBD2 ? or DBD1/DBD+1?

      Unfortunately, separate replicates are required to conduct Differentially Interacting Region analysis as it determines statistically significant interactions. Therefore, we are unable to plot these analyses with individual replicates. 

      - Figure 2: again, how would these analyses look like when performing the analysis with only DBD1/DBD+1/KD1 or DBD2/DBD+2/KD?

      This is a good suggestion. It is possible to do such analysis. However, we will lose resolution as such that we may not accurately detect TADs, especially smaller TADs. Therefore, we decided to combine the biological replicates.   

      Another major question is the stability of EWS-FLI DBD vs EWS-FLI DBD+ proteins. In the WB, FLAG intensities seem also higher (2/3 replicates) in DBD+ condition compared to the DBD condition (Figure S1B).

      This is a valid concern with shRNA knock-down/rescue system and we regularly validate new constructs to ensure that they have similar expression levels as rescue with the wildtype fusion before proceeding to more exhaustive experimental workups. We would note that while we have not tested for differences in protein stability, for these constructs we largely see similar expression levels across multiple experiments, multiple cell lines, and multiple sets of hands. There may be some variations in expression level from experiment to experiment, but western blotting is a semiquantitative assay and it is also not possible to rule out that slight differences in band intensity may be a result of error in gel loading. For this reason, alongside western blotting for construct expression, we also validate construct function using RNA-seq and colony formation assays (as reported in this manuscript) and these show good agreement across biological replicates.  

      Indeed, it seems that they have more FLAG (i.e., EWS-FLI) peaks in the DBD+ condition compared to the DBD condition (Figure 2B). 

      We appreciate the comment since the legend of Figure 2B led to a misunderstanding. Figure 2B depicts the number of TADs detected in DBD and DBD+ conditions (height of the bar graphs) and the proportion of those TADs overlapped with FLAG, CTCF, both or neither peaks on y-axis. The number of FLAG peaks is actually lower in DBD+ as compared to DBD as shown in Figure 5A-B.  We clarified our Figure 2 legend to accurately describe the various proportions (color coded section) of TADs bound by DBD/DBD+ FLAG and CTCF.

      Would it be possible that DBD+ is just more expressed or more stable than DBD? The higher stability of the re-expressed DBD+ could also partially explain their results independently of the 3D conformational change. In other words, can they exclude that DBD+ and DBD binding are not related to their respective protein stability or their global re-expression levels?

      It is possible that DBD+ protein is overexpressed or more stable than DBD. With our current set of data, we cannot conclusively exclude if binding by DBD and DBD+ are not related to their expression level or stability. We would note, as above, that western blots, RNA-seq, and agar assays have largely reproduced across experiments, hands, and cell lines and that western blot is an imperfect assay for assessing protein stability.

      Surprisingly, WB FLI bands in DBD+ conditions are systematically (3/3 replicates) fainter than in DBD conditions (Figure S1B). How do the authors explain these opposite results between FLI and FALG in the WB?

      This is an excellent observation that highlights one of the intricacies of studying EWSR1::FLI1 in our KD/rescue system. Often the limiting factor for an experiment is whether or not the KD condition maintains KD through a second viral transduction for rescue and selection. We have observed over many years of working with this system that rescue conditions which are fully functional (i.e. wildtype EWSR1::FLI1, DBD+, etc.) tend to maintain better KD of endogenous EWSR1::FLI1. Constructs that don’t rescue EWSR1::FLI1 function sometimes maintain KD to a lesser degree, though frequently to a functional degree (i.e. cells are not transformed and EWSR1::FLI1 transcriptional regulation is not rescued). We suspect this observation, also raised by Reviewer 1 is resulted from a potential selection of cells with more endogenous EWSR1::FLI1 escaping KD in in DBD conditions due to selective pressures during expansion in tissue culture.

      We should note that the antibody used for detecting FLI recognizes residues that are deleted in

      DBD and DBD+ constructs, such that the FLI1 blot in Supplementary Figure 1B does not detect either construct. It only detects endogenous EWSR1::FLI1 and the 3X-FLAG-EWSR1::FLI1 construct in the middle lane that runs at a slightly higher molecular weight. The FLAG antibody is the only antibody that detects all three rescue constructs.    

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Bayanjargal et al. entitled "The DBD-alpha4 helix of EWS::FLI is required for GGAA microsatellite binding that underlies genome regulation in Ewing sarcoma" reports on the critical role of a small alpha helix in the DNA binding domain (DBD) of the FLI1 portion of EWS::FLI1 that is critical for binding to repetitive stretches of GGAA-motifs, i.e. GGAA microsatellites, which serve as potent neoenhancers in Ewing sarcoma.

      We thank Reviewer 2 for their succinct and accurate summary of our manuscript. 

      Strengths:

      The paper is generally well-written, and easy to follow and the data presented are of high quality, welldescribed and underpin the conclusions of the authors. The report sheds new light on how EWS::FLI1 mechanistically binds to and activates GGAA microsatellite enhancers, which is of importance to the field.

      We appreciate the reviewer’s assessment of our work. 

      Weaknesses:

      While there are no major weaknesses in this paper, there are a few minor issues that the authors may wish to address before publication:

      (1) While the official protein symbol for the gene EWSR1 is indeed EWS, the protein symbol for the gene FLI1 is identical, i.e. FLI1. The authors nominate the fusion oncoprotein EWS::FLI1 (even in the title) but it appears more adequate to use EWS::FLI1.

      We appreciate the reviewer for bringing this to our attention. Indeed, the most recent guideline for fusion proteins nomenclature is to use the full gene symbols separated by double colons. Therefore, the accurate nomenclature is EWSR1::FLI1. We replaced instances of EWS::FLI with EWSR1::FLI1 and have used the EWSR1::ERG nomenclature in our revised manuscript.  

      (2) The used cell lines should be spelled according to their official nomenclature (e.g. A-673 instead of A673).

      Corrected, thanks!

      (3) It appears as if the vast majority of results were generated in a single Ewing sarcoma cell line (A-673) which is an atypical Ewing sarcoma cell line harboring an activating BRAF mutation and may be genomically quite unstable as compared to other Ewing sarcoma cell lines (Kasan et al. 2023 preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2023.11.20.567802v1). Hence, it may be supportive for the paper to recapitulate/cross-validate a few key results in other Ewing sarcoma cell lines, e.g. by using EWS::ERG-positive cell lines. Perhaps the authors could make use of available published data.

      We thank Reviewer 2 for this helpful comment. We replicated the experiments in TTC-466 cells containing EWSR1::ERG fusion and found that as for A-673 cells the DBD-α4 helix is important for transcriptional, enhancer, and 3D chromatin regulation (Supplementary Figures 9-18).  

      (4) Figure 6 and Supplementary Figure 5 are very interesting but focus on two selected target genes of the fusion (FCGRT and CCND1). It would be interesting to see whether these findings also extend to common EWS::ETS transcriptional signatures that have been reported. The authors could explore their data and map established consensus EWS::ETS signatures to investigate which other hubs might be affected at relevant target genes.

      We expanded our analysis to other genes demonstrated to be regulated by EWSR1::FLI1 nucleated transcriptional hubs (Chong, et. al., 2018) and included NKX2-2 and GSTM4 gene regions in

      Supplementary Figure 7-8 in A-673 cells. We also investigated the same gene regions of FCGRT, CCND1, NKX2-2, GSTM4 in TTC466 cells and report them in Supplementary Figures 14-17. For the purpose brevity, we decided to include the above examples. We may need to develop different tools to conduct further analysis to understand the gene regulatory networks driven by DBD and DBD+ in relation to hub formation. Although it is a great suggestion to map such network, this may be outside the scope of this manuscript. We thank the reviewer for bringing such a good point to our attention.  

      (5) Table 1 is a bit hard to read. In my opinion, it is not necessary to display P-values with up to 8 decimal positions. The gene symbols should be displayed in italic font.

      Suggestions are adapted, thanks!

      Reviewing Editor (Recommendations For The Authors):

      We would draw the authors' attention to the following issues that would best benefit from additional revision.

      As indicated by Referee 1, an important issue concerns the apparent poor reproducibility of Micro-C data. In Figure 1B, the clustering of the DBD1, DBD2, DBD+1, and DBD+2 is poor.

      It appears that the distances/clustering observed between replicates are typically similar or even larger than between biological conditions. Lines 80-83: "KD replicates clustered together with DBD replicate 1 on both axes and with DBD replicate 2 on the y-axis. DBD+ replicates, on the other hand, clustered away from both KD and DBD replicates. If one replaced DBD replicate 1 with DBD replicate 2, this statement would no longer be true. The referees believe that it is important to fully account for these potential discrepancies. Most of the study is based on analyses of these data sets, so if there are issues with them it has repercussions on the entire study. We note however that in Figure 4A the clustering of the H3K27ac data is much more convincing. The referees also feel that it is important to show immunoblots of the expression of DBD and DBD+ levels in the experiments performed here. While this was previously shown in the Boone et al publication in 2021, it could be illustrated again here.

      We thank the editors for concisely summarizing the main weaknesses of the paper and underscoring the importance of the Micro-C data in the rest of the paper. While the Editors note tighter clustering of the H3K27ac (Figure 4A), we would like to note that the replicates cluster much closely on PCA plot of RNA-seq data (Supplementary Figure 1C). Notably, the RNA-seq result has now reproduced when performed with different sets of hands across multiple studies (Boone, et. al., 2021 and this report), as well as in a second cell line (as reported in this manuscript revision). Though not as tight, the H3K27ac CUT&Tag also reproduces in TTC466 cells. Thus, we interpret these findings to indicate that our replicates are functionally similar to each other. As discussed above in the response to Reviewer 1 in more detail, there are several factors that could affect how these functional similarities are represented in Micro-C data. Micro-C is ultimately a readout of the chromatin organization in a heterogeneous population of cells (Misteli et al., 2020). Additionally, sequencing depth limitations in conventional Micro-C experiments limit the ability to faithfully assess the enhancer-promoter interactions that may be relevant for our model system (Goel, et. al., 2023). Thus, both the strength of the biologically relevant signal and the heterogeneous chromatin state present in bulk samples could affect the average signal and lead to poorly clustering replicates (Hafner and Boettiger, 2022). 

      To address these important concerns about rigor and reproducibility of the analyses, we repeated our study in an additional cell line, TTC466, and largely reproduced our high-level findings for transcription, enhancer formation, and 3D chromatin. These additional studies were not without their own limitations and these are addressed in the Discussion section (392-420). The reproduction of weak/moderate clustering in the MDS plot in both A673 and TTC466 cell lines suggests the α4 helix of EWSR1::ETS fusions are important for reshaping 3D chromatin. However, additional genomic analyses geared toward higher resolution at specific EWSR1::ETS-bound loci are likely an important area of future study required to fully understand the role of the α4 helix in chromatin regulation in Ewing sarcoma. Live cell imaging, as performed by Chong, et. al., 2018 and additional biochemical techniques may also be informative and are beyond the scope of this report.

      With regards to concerns about construct expression, we have included immunoblots of the rescue constructs in both cell lines (Supplementary Figure 1B and 9A) and discussed Reviewer 1’s specific concerns in detail above.  

      The referees also raise the issue of using an additional cell line to make a more general message. Although it would perhaps be asking too much to repeat the MicroC experiments, consolidation of the observations could be performed by focusing on specific loci such as FCGRT and CCND1 that were analyzed in this study. Could the authors use 4C-type experiments to reproduce the conclusions in an additional cell line? It would also be pertinent to consolidate the findings at these loci by 4C-type approaches even in the cell line used here. For the moment, all conclusions are based on the same set of data and a single technical approach.

      We repeated the experiments in TTC466 cells and analyzed the data using same cut-offs used in A-673 cells. This allows us to compare between the two cell lines. We hope this new set of experiments and analyses address the reviewers’ concerns.  

      Reviewer #1 (Recommendations For The Authors):

      All the data are performed in A673 cells. Knowing the transcriptomic and epigenetic heterogeneity of Ewing sarcoma cells, some of the experiments supporting their findings should be replicated in at least another Ewing sarcoma model.

      Per our discussion above, we have replicated our experiments in an additional cell line model of Ewing sarcoma. Importantly, the TTC466 cell line used expresses the EWSR1::ERG fusion found in 10-15% of Ewing sarcoma cases.  

      Supplementary Figure 2B. Proportion of TAD boundaries bound by FLAG (i.e., EWS-FLI1) and CTCF. The number/proportion of FLAG (i.e., EWS-FLI) peaks observed at CTCF peak/TAD boundaries seems unexpectedly high. How do they explain this result since EWS-FLI peaks are rather intra-TAD to mediate their enhancer function?

      In our previous study, we showed that EWSR1::FLI1 binding can be detected at boundaries of TADs (Showpnil, et. al., 2022). We think therefore it is likely that EWSR1::FLI1 binding is able to mediate enhancer function both inside TADs as well as at the borders of TADs and may, in some cases, function as an insulator between TADs.  

      For the >50kb loop analysis, what was the low-range threshold? Up to 15-20 kp, contact frequency interactions may be caused by PFA crosslink (did they use a 5kb threshold ?). Were those excluded from that analysis?

      We acknowledge that we did not use a lower threshold to exclude those short-range loop interactions. In our previous study, we observed that EWSR1::FLI1 binding reduces long-range interactions in favor of short-range interactions (Showpnil, et. al., 2022) and wanted to be able to capture short-range loops in our analysis.  

      In Figure 2D, they observed that within TADs containing FLAG peaks at GGAA microsatellites, the intensity of the DBD+ FLAG peaks was higher compared to DBD FLAG peaks. How would this analysis look when considering the ETS FLAG peaks (i.e., EWS-FLI rather repressive peaks)? Could they compare TAD with GGAA msat vs TAD with ETS peaks?

      We agree that this is an interesting observation. In our prior analyses we found no discernible relationship between EWSR1::FLI1 binding and changes in 3D chromatin associated with repression (Showpnil, et. al., Nucleic Acids Research, 2022). In contrast, EWSR1::FLI1-bound superenhancers had greater H3K27ac deposition when overlapping both a bound GGAA repeat and a non-microsatellite site. While there have been several additional reports about the relevance of EWSR1::FLI1 binding at nonmicrosatellite peaks, motifs at these loci have not yet been rigorously defined as GGAA repeats were by Johnson, et. al. in PLoS One, 2017. Each ETS factor binds different motifs containing the core 5’-GGAA-3’ with varying affinities depending on the flanking residues. There may be >100-fold difference in sequence-specific binding affinity for “high” vs. “low” affinity motifs. Better defining the types of ETS motifs bound by EWSR1::FLI1 and the functional changes associated with them thus represents an interesting area of future study.

      Figure 1F: What is the biological meaning of these results (29.7, 39.5, and 54Mbp)? These distances are typically the size of a chromosome arm and clearly beyond classical chromatin loop/TAD structures in which EWS-FLI mediates its cis-activity.

      We agree with referee here. This panel is now removed in our revised manuscript.  

      How do DBD, KD, and DBD+ conditions compare with WT parental cells in the omics data? (Figures 1B, 4A). Do DBD+ conditions overlap with WT conditions? It would be nice to have these analyses also for Micro-C and Cut&Tag data. To be acknowledged here, the transcriptome data showing this aspect in Figure S1C are very convincing.

      This is a fair point. We were not able to obtain similar sequencing depth of wtEF Micro-C libraries to that of KD, DBD and DBD+ due to disproportional use of wtEF libraries in troubleshooting. Therefore, we decided to exclude wtEF condition from these analysis. 

      EWS-FLI cis-regulation at CCND1 also occurs through a much closer EWS-FLI peak (~-20kb msat upstream of CCND1 TSS) which was not taken into consideration. EWS-FLI peak intensity in both DBD and DBD+ at this msta seems similar. How would this fit into their model?

      The referee is correct. The closest peak upstream of CCND1 TSS is about ~19kb away. We highlighted this peak with the dashed boxes near the CCND1 TSS (Supplementary Figure 6). Peak intensity of DBD+ FLAG is slightly higher compared to DBD. Nonetheless, we acknowledge that the difference is small. We suspect that the DBD-α4 helix is affecting binding dynamics at GGAA repeats, but these genomics approaches are not well suited to detect small, but significant, changes in binding affinity or dynamics. In this case a more biochemical approach may be needed. Even though, both protein can still bind the same microsatellites, it is possible that they might differ in their stability of binding or in the recruitment of additional proteins. These possibilities are discussed in the Discussion section (444-463).  

      For the Micro-C, they sequenced only 7 to 8 million reads per condition. This coverage seems particularly low, especially for their analyses using 1-5kb bins. How does this compare with other published Micro-C data? Can this explain the variability observed between replicates?

      We apologize for the inconsistent verbiage of sequencing coverage that may have caused confusion. 7 to 8 million reads were used for shallow sequencing and QC analysis. Once a sample passed QC, we then sequenced 300 million reads per sample. 300M is now changed to 300 million to prevent a misunderstanding at line 598.  

      They mention:

      "In our recent studies of EWS::FLI, we found a small alpha helix in the DNA binding domain DBD-𝛼𝛼4, to

      be required for transcription and regulation by the fusion protein (Boone et al., 2021). Interestingly, this study did not find any change in chromatin accessibility (ATAC-Seq) and genome localization of EWS::FLI constructs (CUT&RUN) when DBD-𝛼𝛼4 helix was deleted leaving the mechanistic basis for the requirement of DBD-𝛼𝛼4 in transcription regulation unclear. "

      And

      "To assay the enhancer landscape, we collected H3K27ac CUT&Tag data from KD, DBD, and DBD+ cells. Principal component analysis of H3K27ac localization shows that the DBD replicates were clustered closer to the KD replicates while being in between the KD and the DBD+ replicates (Figure 4A), suggesting that DBD-𝛼𝛼4 helix is required to reshape the enhancer landscape."

      But now H3K27ac CUT&Tag show strong differences which were not observed in ATAC seq. How to explain this discrepancy?

      Though both H3K27ac and ATAC signal are associated with enhancers and promoters in euchromatin, they are not exactly measurements of the same thing. H3K4me2 is a mark more closely associated with ATAC signal than H3K27ac (Henikoff, et. al., 2020). Nonetheless, there are clear differences between the prior publication (Boone, et. al., 2021) and this work with regards to similar ATAC signal for each replicate and differences in H3K27ac. We suspect this may be related to a tighter association between H3K27ac and EWSR1::FLI1-mediated genome regulation and ATAC. Notably, there were very few differentially accessible regions between EWSR1::FLI1-depleted cells and conditions with EWSR1::FLI1 expression (either endogenous or wildtype rescue) using the A673 KD/Rescue system in Boone, et. al., 2021. In contrast, other A673 KD-rescue studies have reported differences in H3K27ac in EWSR1::FLI1 expressing conditions relative to EWSR1::FLI1-depleted conditions (Theisen, et. al., 2021). .  

      The authors mention:

      "Our study thus uncovered a surprising role for FLI DBD in the process of hub formation which is usually attributed to the EWS low complexity domain."

      Not sure this can be claimed, hubs are composed of many other factors that are not investigated here. Furthermore, promoter enhancer hubs/loops often include combined ETS and mSat chains to generate transcriptional hubs which have not been considered here. None of these points were discussed here.

      We replaced “uncovered” with “suggest” in our revised manuscript at line 476.  

      What are the barcode patterns in Supp 5, are those frequently observed in their Micro-C data, likely mapping artifacts, do they have any impact on their analyses?

      The barcode patterns in now Supplementary Figure 6 are blind spots in the hg19 genome assembly. Since they are few in numbers, we don’t expect these blind spots to impact our analysis.

    1. Author response:

      Reviewer #1 (Public Review): 

      Summary: 

      The authors use fluorescence lifetime imaging (FLIM) and tmFRET to resolve resting vs. active conformational heterogeneity and free energy differences driven by cGMP and cAMP in a tetrameric arrangement of CNBDs from a prokaryotic CNG channel. 

      Strengths: 

      The excellent data provide detailed measures of the probability of adopting resting vs. activated conformations with and without bound ligands. 

      Weaknesses: 

      Limitations are that only the cytosolic fragments of the channel were studied, and the current manuscript does not do a good job of placing the results in the context of what is already known about CNBDs from other methods that yield similar information. 

      In the revision, we will put our results into context of the previous work of CNBD channels where possible.

      Reviewer #2 (Public Review): 

      The authors investigated the conformational dynamics and energetics of the SthK Clinker/CNBD fragment using both steady-state and time-resolved transition metal ion Förster resonance energy transfer (tmFRET) experiments. To do so, they engineered donor-acceptor pairs at specific sites of the CNBD (C-helix and β-roll) by incorporating a fluorescent noncanonical amino acid donor and metal ion acceptors. In particular, the authors employed two cysteine-reactive metal chelators (TETAC and phenM). This allowed them to coordinate three transition metals (Cu2+, Fe2+, and Ru2+) to measure both short (10-20 Å, Cu2+) and long distances (25-50 Å, Fe2+, and Ru2+). By measuring tmFRET with fluorescence lifetimes, the authors determined intramolecular distance distributions in the absence and presence of the full agonist cAMP or the partial agonist cGMP. The probability distributions between conformational states without and with ligands were used to calculate the changes in free energy (ΔG) and differences in free energy change (ΔΔG) in the context of a simple four-state model. 

      Overall, the work is conducted in a rigorous manner, and it is well-written. I greatly enjoyed reading it. 

      Nonetheless, I do not see the novelty that the authors claim. 

      We will try to highlight the novelty in the revision. (See below for examples).

      In terms of methodology, this work provides further support to steady-state and time-resolved tmFRET approaches previously developed by the authors of the present work to probe conformational rearrangements by using a fluorescent noncanonical amino acid donor (Anap) and transition metal ion acceptor (Zagotta et al., eLIfe 2021; Gordon et al., Biophysical Journal 2024; Zagotta et al., Biophysical Journal 2024). 

      This work is the first use of the time-resolved tmFRET method to obtain intrinsic DG (of an apo conformation) and DDG values for different ligands, and the first application of this approach to a protein other than MBP.

      Regarding cyclic nucleotide-binding domain (CNBD)-containing ion channels, I disagree with the authors when they state that "the precise allosteric mechanism governing channel activation upon ligand binding, particularly the energetic changes within domains, remains poorly understood". On the contrary, I would say that the literature on this subject is rather vast and based on a significantly large variety of methodologies. This is a not exhaustive list of papers: Zagotta et al., Nature 2003; Craven et al., GJP, 2004; Craven et al., JBC, 2008; Taraska et al., Nature Methods, 2009; Puljung et al., JBC, 2013; Saponaro et al., PNAS 2014; Goldschen-Ohm et al., eLife, 2016; Bankston et al., JBC, 2017; Hummert et al., PLoS Comput Biol., 2018; Porro et al., eLife, 2019; Ng et al., JGP, 2019; Porro et al., JGP, 2020; Evans et al., PNAS, 2020; Pfleger et al., Biophys J. 2021; Saponaro et al., Mol Cell, 2021; Dai et al., Nat Commun. 2021; Kondapuram et al., Commun Biol. 2022. These studies were conducted either on the isolated Clinker/CNBD fragments or on the entire full-length proteins. As is evident from the above list, the authors of the present work have significantly contributed to the understanding of the allosteric mechanism governing the ligand-induced activation of CNBD-containing channels, including a detailed description of the energetic changes induced by ligand binding. Particularly relevant are their works based on DEER spectroscopy. In DeBerg et al., JBC 2016, the authors described, in atomic detail, the conformational changes induced by different cyclic nucleotides on the HCN CNBD fragment and derived energetics associated with ligand binding to the CNBD (ΔΔG). In Collauto et al., Phys Chem Chem Phys. 2017, they further detailed the ligand-CNBD conformational changes by combining DEER spectroscopy with microfluidic rapid freeze quench to resolve these processes and obtain both equilibrium constants and reaction rates, thus demonstrating that DEER can quantitatively resolve both the thermodynamics and the kinetics of ligand binding and the associated conformational changes. 

      Despite this vast literature, some of which is our own work, there is no consensus about the energetics and coupling of domains that underlies the allosteric mechanism in any CNBD channel. Our approach addresses energetics of the CNBD upon ligand binding, which we aim to later expand to a more complete assessment of the allosteric mechanism in the intact channel.

      Suggestions: 

      - In light of the above, I suggest the authors better clarify the contribution/novelty that the present work provides to the state-of-the-art methodology employed (steady-state and time-resolved tmFRET) and of CNBD-containing ion channels. In particular, it would be nice to have a comparison with the conformational dynamics and energetics reported in the previous works of the authors based on DEER spectroscopy (DeBerg et al., JBC 2016, Collauto et al., Phys Chem Chem Phys. 2017 and Evans et al., PNAS, 2020) and with Goldschen-Ohm et al., eLife, 2016, where single-molecule events (FRET-based) of cAMP binding to HCN CNBD were measured and kinetic rate constants were models in the context of a simple four-state model, reminiscent of the model employed in the present work. 

      In the revision, we will put our results into context of the previous work of CNBD channels where possible.

      - Even considering the bacterial SthK channel, cryo-EM has significantly advanced the atomistic understanding of its ligand-dependent regulation (Rheinberger et al., eLife, 2018). More recently, the authors of the present work have elegantly employed DEER on full-length SthK protein to reveal ligand-dependent conformational rearrangements in the Clinker region (Evans et al., PNAS, 2020). In light of the above, what is the contribution/novelty that the present work provides to the SthK biophysics? 

      Neither of the papers mentioned above (structure or DEER) reported energetics for SthK. This work describes an approach that will allow us to get a more complete picture of the energetics of SthK.

      - The authors decided to use the Clinker/CNBD fragment of SthK. On the basis of the above-cited work (Evans et al., PNAS, 2020) the authors should clarify why they have decided to work on the isolated Clinker/CNBD fragment and not on the full-length protein. I assume that the use of the C-licker/CNBD fragment was necessary to isolate tetramers with only one labelled subunit (fSEC and MP were used to confirm this) to avoid inter-subunit crass-talk. However, I am not clear if this is correct. 

      We chose to start on the C-terminal fragment to provide a technically more tractable system for validating our approach using time-resolved tmFRET before moving to the full-length membrane protein.

      - What is the advantage of using the Clinker/CNBD fragment of a bacterial protein and not one of HCN channels, as already successfully employed by the authors (see above citations)? 

      SthK is a useful model system that allows us to later express full-length channels in bacteria.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript aims to provide insights into conformational transitions in the cyclic nucleotide-binding domain of a cyclic nucleotide-gated (CNG) channel. The authors use transition metal FRET (tmFRET) which has been pioneered by this lab and previously led to detailed insights into ion channel conformational changes. Here, the authors not only use steady-state measurements but also time-resolved, fluorescence lifetime measurements to gain detailed insights into conformational transitions within a protein construct that contains the cytosolic C-linker and cyclic nucleotide-binding domain (CNBD) of a bacterial CNG channel. The use of time-resolved tmFRET is a clear advancement of this technique and a strength of this manuscript. 

      In summary, the present work introduced time-resolved tmFRET as a novel tool to study conformational distributions in proteins. This is a clear technological advance. At this stage, conclusions made about energetics in CNG channels are overstated. However, it will be interesting to see in the future how results compare to similar measurements on full-length channels, for example, reconstituted into nanodiscs. 

      Strengths: 

      The results capture known differences in promoting the open state between different ligands (cAMP and cGMP) and are consistent across three donor-acceptor FRET pairs. The calculated distance distributions further are in reasonable agreement with predicted values based on available structures. The finding that the C-helix is conformationally more mobile in the closed state as compared to the open state quantitatively increases our understanding of conformational changes in these channels. 

      Weaknesses: 

      While the use of a truncated construct of SthK is justified, it also comes with certain limitations. The construct is missing the transmembrane part including the pore for ions. However, the pore is the central part of every ion channel and is crucial to describe conformational transitions and energetics that lead to ion channel gating. Two observations in the present study disagree with the results for the full-length channel protein. Here, under apo conditions, the CNBD can adopt an 'open' conformation, and second, cooperativity of channel opening is lost. These differences need to be weighed carefully when judging the impact of the presented results for understanding allostery in CNG channels. Qualitatively, the results can describe movements of the C-helix in CNBDs, but detailed energetics as calculated in this study, need to be limited to the truncated protein construct used. The entire ion channel is an allosteric system and detailed, energetic conclusions cannot be made for the full-length channel when working with only the cytosolic domains. Similarly, the statement "These results demonstrate that time-resolved tmFRET can be utilized to obtain energetic information on the individual domains during the allosteric activation of SthK." is misleading. The data only describe movements of the C-helix. Upon ligand binding, the C-helix moves upwards to coordinate the ligand. Thus, the results are ligand-induced conformational changes (as the title states). Allosteric regulation usually involves remote locations in the protein, which is not the case here. 

      We agree that the full-length channel is more complicated than the C-terminal fragment studied here, but we disagree that there isn’t relevant energetic information from the individual domains. For example, the DDG values measured for the C-helix movement in the isolated fragment should be the same as those of the intact channel. In the future we aim to make direct comparisons of the energetics between the fragment and the intact channel.

    1. Author response:

      We thank the editors and the reviewers for their considered comments and helpful suggestions.

      In our revision, we plan to focus on tightening the relationship between the bias-variance tradeoff theory and the empirical analyses that follow.

      We will also work to better communicate what we argue—and what is beyond our scope—with respect to GxE in complex traits. For example, our language is currently insufficiently clear as it suggested to the editor and reviewers that we are developing a method to characterize polygenic GxE here. Developing a new method that does so (let alone evaluating performance in extensive scenarios) is beyond the scope of this manuscript.

      Similarly, we use amplification only as an example of a mode of GxE that is not adequately characterized by current approaches. We do not wish to argue it is an omnibus explanation for all GxE in complex traits. In many cases, a mixture of polygenic GxE relationships seems most fitting (as observed, for example, in Zhu et al., 2023, for GxSex in human physiology).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study combines fMRI and electrophysiology in sedated and awake rats to show that LFPs strongly explain spatial correlations in resting-state fMRI but only weakly explain temporal variability. They propose that other, electrophysiology-invisible mechanisms contribute to the fMRI signal. The evidence supporting the separation of spatial and temporal correlations is convincing, however, the support of electrophysiological-invisible mechanisms is incomplete, considering alternative potential factors that could account for the differences in spatial and temporal correlation that were observed. This work will be of interest to researchers who study the fundamental mechanisms behind resting-state fMRI.

      We appreciate the encouraging comments. We added a section in discussion that thoroughly discussed the potential alternative factors that could account for the differences in spatial and temporal correlation that we observed. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Tu et al investigated how LFPs recorded simultaneously with rsfMRI explain the spatiotemporal patterns of functional connectivity in sedated and awake rats. They find that connectivity maps generated from gamma band LFPs (from either area) explain very well the spatial correlations observed in rsfMRI signals, but that the temporal variance in rsfMRI data is more poorly explained by the same LFP signals. The authors excluded the effects of sedation in this effect by investigating rats in the awake state (a remarkable feat in the MRI scanner), where the findings generally replicate. The authors also performed a series of tests to assess multiple factors (including noise, outliers, and nonlinearity of the data) in their analysis.

      This apparent paradox is then explained by a hypothetical model in which LFPs and neurovascular coupling are generated in some sense "in parallel" by different neuron types, some of which drive LFPs and are measured by ePhys, while others (nNOS, etc.) have an important role in neurovascular coupling but are less visible in Ephys data. Hence the discrepancy is explained by the spatial similarity of neural activity but the more "selective" LFPs picked up by Ephys account for the different temporal aspects observed.

      This is a deep, outstanding study that harnesses multidisciplinary approaches (fMRI and ephys) for observing brain activity. The results are strongly supported by the comprehensive analyses done by the authors, which ruled out many potential sources for the observed findings. The study's impact is expected to be very large.

      Comment: There are very few weaknesses in the work, but I'd point out that the 1second temporal resolution may have masked significant temporal correlations between

      LFPs and spontaneous activity, for instance, as shown by Cabral et al Nature Communications 2023, and even in earlier QPP work from the Keilholz Lab. The synchronization of the LFPs may correlate more with one of these modes than the total signal. Perhaps a kind of "dynamic connectivity" analysis on the authors' data could test whether LFPs correlate better with the activity at specific intervals. However, this could purely be discussed and left for future work, in my opinion.

      We appreciate this great point. Indeed, it is likely that LFP and rsfMRI signals are more strongly related during some modes/instances than others, and hence correlation across the entire time series may have masked this effect. In addition, we agree that 1-second temporal resolution may obscure some temporal correlations between LFPs and rsfMRI signal. The choice of 1-second temporal resolution was made to be consistent with the TR in our fMRI experiment, considering the slow hemodynamic response. Ultrafast fMRI imaging combined with dynamic connectivity analysis in a future study might enable more detailed examination of BOLD-LFP temporal correlations at higher temporal resolutions. We have added the following paragraph to the revised manuscript:

      “Our proposed theoretic model represents just one potential explanation for the apparent discrepancy in temporal and spatial relationships between resting-state electrophysiology and BOLD signals. It is important to acknowledge that there may be other scenarios where a stronger temporal relationship between LFP and BOLD signals could manifest. For instance, recent research suggests that the relationship between LFP and rsfMRI signals may vary across different modes or instances (Cabral et al., 2023), which can be masked by correlations across the entire time series. Moreover, the 1-second temporal resolution employed in our study may obscure certain temporal correlations between LFPs and rsfMRI signals. Future investigations employing ultrafast fMRI imaging coupled with dynamic connectivity analysis could offer a more nuanced exploration of BOLD-LFP temporal correlations at higher temporal resolutions (Bolt et al., 2022; Cabral et al., 2023; Ma and Zhang, 2018; Thompson et al., 2014).”

      Reviewer #2 (Public Review):

      The authors address a question that is interesting and important to the sub-field of rsfMRI that examines electrophysiological correlates of rsfMRI. That is, while electrophysiology-produced correlation maps often appear similar to correlation maps produced from BOLD alone (as has been shown in many papers) is this actually coming from the same source of variance, or independent but spatially-correlated sources of variance? To address this, the authors recorded LFP signals in 2 areas (M1 and ACC) and compared the maps produced by correlating BOLD with them to maps produced by BOLD-BOLD correlations. They then attempt to remove various sources of variance and see the results.

      The basic concept of the research is sound, though primarily of interest to the subset of rsfMRI researchers who use simultaneous electrophysiology. However, there are major problems in the writing, and also a major methodological problem.

      Major problems with writing:

      Comment 1: There is substantial literature on rats on site-specific LFP recording compared to rsfMRI, and much of it already examined removing part of the LFP and examining rsfMRI, or vice versa. The authors do not cover it and consider their work on signal removal more novel than it is.

      We have added more literature studies to the revised manuscript. It is important to note that while there exists a substantial body of literature on site-specific LFP recording coupled with rsfMRI, our paper makes a significant contribution by unveiling the disparity in temporal and spatial relationships between resting-state electrophysiological and fMRI signals. This goes beyond mere reporting of spatial/temporal correlations. Furthermore, our exploration of the impact of removing LFP on rsfMRI spatial patterns constitutes one among several analyses employed to demonstrate that the temporal fluctuations of LFP minimally affect BOLD-derived RSN spatial patterns. We wish to clarify that our intention is not to claim this aspect of our work is more novel than similar analyses conducted in previous studies (we apologize if our original manuscript conveyed that impression). Rather, the novelty lies in the objective of this analysis, which is to elucidate the displarity in temporal and spatial relationships between resting-state electrophysiological and fMRI signals—a crucial issue that has not been thoroughly addressed previously. 

      Comment 2: The conclusion of the existence of an "electrophysiology-invisible signal" is far too broad considering the limited scope of this study. There are many factors that can be extracted from LFP that are not used in this study (envelope, phase, infraslow frequencies under 0.1Hz, estimated MUA, etc.) and there are many ways of comparing it to the rsfMRI data that are not done in this study (rank correlation, transformation prior to comparison, clustering prior to comparison, etc.). The one non-linear method used, mutual information, is low sensitivity and does not cover every possible nonlinear interaction. Mutual information is also dependent upon the number of bins selected in the data. Previous studies (see 1) have seen similar results where fMRI and LFP were not fully commensurate but did not need to draw such broad conclusions.

      First we would like to clarify that the existence of "electrophysiologyinvisible signal" is not necessarily a conclusion of the present study, per se, as described by the reviewer. As we stated in our manuscript, it is a proposed theoretical model. We fully acknowledge that this model represents just one potential explanation for the apparent discrepancy in temporal and spatial relationships between resting-state electrophysiology and BOLD signals. It is important to acknowledge that there may be other scenarios where a stronger temporal relationship between LFP and BOLD signals could manifest. This issue has been further clarified in the revised manuscript (see the section of Potential pitfalls). 

      We agree with the reviewer that not all factors that can be extracted from LFP are examined. In our current study we focused solely on band-limited LFP power as the primary feature in our analysis, given its prevalence in prior studies of LFP-rsfMRI correlates. More importantly, we demonstrate that band-specific LFP powers can yield spatial patterns nearly identical to those derived from rsfMRI signals, prompting a closer examination of the temporal relationship between these same features. Furthermore, since correlational analysis was used in studying the LFP-BOLD spatial relationship, we used the same analysis method when comparing their temporal relationship. 

      Extracting all possible features from the electrophysiology signal and examining their relationship with the rsfMRI signal or exploring all other types of ways of comparing LFP and rsfMRI signals goes beyond the scope of the current study. However, to address the reviewer’s concern, we tried a couple of analysis methods suggested by the reviewer, and results remain persistent. Figure S14 shows the results from (A) the rank correlation and (B) z transformation prior to comparison. We added these new results to the revised manuscript.

      Comment 3: The writing refers to the spatial extent of correlation with the LFP signal as "spatial variance." However, LFP was recorded from a very limited point and the variance in the correlation map does not necessarily reflect underlying electrophysiological spatial distributions (e.g. Yu et al. Nat Commun. 2023 Mar 24;14(1):1651.)

      The reviewer accurately pointed out that in our paper, “spatial variance” refers to the spatial variance of BOLD correlates with the LFP signal. Our objective is to assess the extent to which this spatial variance, which is derived from the neural activity captured by LFP in the M1 or ACC, corresponds to the BOLD-derived spatial patterns from the same regions. We acknowledge that this spatial variance may differ from the spatial map obtained by multi-site electrophysiology recordings. Nevertheless, numerous studies have consistently reported a high spatial correspondence between BOLD- and electrophysiology-derived RSNs using various methodologies across different physiological states in both humans and animals. For instance, research employing electroencephalography (EEG) or electrocorticography (ECoG) in humans demonstrates that RSNs derived from the power of multiple-site electrophysiological signals exhibit similar spatial patterns to classic BOLD-derived RSNs such as the default-mode network (Hacker et al., 2017; Kucyi et al., 2018). These studies well agree with our findings. Notably, the reference paper cited by the reviewer studies brain-wide changes during transitions between awake and various sleep stages, which is quite different from the brain states examined in our study.

      Major method problem:

      Comment 4: Correlating LFP to fMRI is correlating two biological signals, with unknown but presumably not uniform distributions. However, correlating CC results from correlation maps is comparing uniform distributions. This is not a fair comparison, especially considering that the noise added is also uniform as it was created with the rand() function in MATLAB.

      This is a good point. We examined the distributions of both LFP powers and fMRI signals. They both seem to follow a normal distribution. Below shows distributions of the two signals from a random scan. In addition, z transformation prior to comparison generated the same results (Fig. S14).

      Author response image 1.

      Exemplar distributions of A) the fMRI signal of M1, and B) HRF-convolved LFP power in M1.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1: In the Discussion, a few more calcium imaging papers could be fruitfully discussed (e.g. Ma et al Resting-state hemodynamics are spatiotemporally coupled to synchronized and symmetric neural activity in excitatory neurons, PNAS 2016, or more recently Vafaii et al, Multimodal measures of spontaneous brain activity reveal both common and divergent patterns of cortical functional organization, Nat Comms 2024).

      We appreciate this suggestion. We have added the following discussions to the revised manuscript: 

      “These findings indicate the temporal information provided by gamma power can only explain a minor portion (approximately 35%) of the temporal variance in the BOLD time series, even after accounting for the noise effect, which is in line with the reported correlation value between the cerebral blood volume and fluctuations in GCaMP signal in head-fixed mice during periods of immobility (R = 0.63) (Ma et al., 2016).” 

      “It is plausible that employing different features or comparison methods could yield a stronger BOLD-electrophysiology temporal relationship (Ma et al., 2016).”

      “Furthermore, in a more recent study by Vafaii and colleagues, overlapping cortical networks were identified using both fMRI and calcium imaging modalities, suggesting that networks observable in fMRI studies exhibit corresponding neural activity spatial patterns (Vafaii et al., 2024).” 

      “Furthermore, Vafaii et. al. revealed notable differences in functional connectivity strength measured by fMRI and calcium imaging, despite an overlapping spatial pattern of cortical networks identified by both modalities (Vafaii et al., 2024).”

      Comment 2: Similarly when discussing the "invisible" populations, perhaps Uhlirova et al eLife 2016 should be mentioned as some types of inhibitory processes may also be less clearly observed in LFPs but rather strongly contribute to NVC.

      We appreciate the suggestion. We added the following sentences to the revised manuscript. 

      “Additionally, Uhlirova et al. conducted a study where they utilized optogenetic stimulation and two-photon imaging to investigate how the activation of different neuron types affects blood vessels in mice. They discovered that only the activation of inhibitory neurons led to vessel constriction, albeit with a negligible impact on LFP (Uhlirova et al., 2016).”

      Reviewer #2 (Recommendations For The Authors):

      Major problems with writing:

      Comment 1: The authors need to review past work to better place their study in the context of the literature (some review articles: Lurie et al. Netw Neurosci. 2020 Feb 1;4(1):30-69. & Thompson et al. Neuroimage. 2018 Oct 15;180(Pt B):448-462.)

      Here are some LFP and BOLD "resting state" papers focused on dynamic changes.

      Many of these papers examine both spatial and temporal extents of correlations. Several of these papers use similar methods to the reviewed paper.

      Also, many of these papers dispute the claim that correlations seen are

      "electrophysiology invisible signal." Note that I am NOT saying that "electrophysiology invisible" correlations do not exist (it seems very likely some DO exist). However, the authors did not show that in the reviewed paper, and some of the correlations which they call an "electrophysiology invisible signal" probably would be visible if analyzed in a different manner.

      Quite a few literature studies that the reviewer suggested were already included in the original manuscript. We have also added more literature studies to the revised manuscript. Again, we would like to emphasize that the novelty of our study centers on the discovery of the disparity in temporal and spatial relationships between resting-state electrophysiological and fMRI signals. See below our responses to individual literature studies listed.

      In humans:

      https://pubmed.ncbi.nlm.nih.gov/38082179/ Predicts by using models the paper under review does not use here.

      The following discussion was added to the revised manuscript: 

      “Some other comparison methods such as rank correlation and transformation prior to comparison were also tested and results remain persistent (Fig. S14). These findings align with the notion that, compared to nonlinear models, linear models offer superior predictive value for the rsfMRI signal using LFP data, as comprehensively illustrated in (Nozari et al., 2024) (also see Fig. S7). Importantly, in this study, the predictive powers (represented by R2) of various comparison methods tested all remain below 0.5 (Nozari et al., 2024), suggesting that while certain models may enhance the temporal relationship between LFP and BOLD signals, the improvement is likely modest.”

      In nonhuman primates: https://pubmed.ncbi.nlm.nih.gov/34923136/ Most of the variance that could be creating resting state networks is in the <1 Hz band which the paper under review did not study

      ]We also examined infraslow LFP activity (< 1Hz) in our data. Consistent with the finding in the reference paper (Li et al., 2022), infraslow LFP power and the BOLD signal can derive consistent RSN spatial patterns (for M1, spatial correlation = 0.70), while the temporal correlation remains very low (temporal correlation = 0.08). These results and the reference paper were added to the revised manuscript.

      https://pubmed.ncbi.nlm.nih.gov/28461461/ Compares actual spread of LFP vs. spread of BOLD instead of just correlation between LFP and BOLD.

      The following sentence has been added to the revised manuscript.

      “This high spatial correspondence between rsfMRI and LFP signals can even be found at the columnar level (Shi et al., 2017).”   

      https://pubmed.ncbi.nlm.nih.gov/24048850/ Comparison of small (from LFP) to large (from BOLD) spatial correlations in the context of temporal correlations.

      In this study, researchers compared neurophysiological maps and fMRI maps of the inferior temporal cortex in macaques in response to visual images. They observed that the spatial correlation increased as the neurophysiological maps got greater levels of spatial smoothing. This suggests that fMRI can capture large-scale spatial information, but it may be limited in capturing fine details. Although interesting, this paper did not study the electrophysiology-fMRI relationship at the resting state and hence is not very relevant to our study.

      https://pubmed.ncbi.nlm.nih.gov/20439733/ Electrophysiology from a single site can correlate across nearly the entire cerebral cortex.

      We have included the discussion of this paper in the original manuscript.

      https://pubmed.ncbi.nlm.nih.gov/18465799/ The original dynamic BOLD and LFP work from 2008 by Shmuel and Leopold included spatiotemporal dynamics.

      We have included the discussion of this paper in the original manuscript.

      In rodents:

      https://pubmed.ncbi.nlm.nih.gov/34296178/ Better electrophysiological correspondence was found using alternate methods the paper under review does not use.

      This study investigates the electrophysiological correspondence in taskbased fMRI, while our study focused on resting state signals.

      https://pubmed.ncbi.nlm.nih.gov/31785420/ Electrophysiological basis of co-activation patterns, similar comparisons to the paper under review.

      We have included the discussion of this paper in the original manuscript.

      https://pubmed.ncbi.nlm.nih.gov/29161352/ Cross-frequency coupling of LFP modulating the BOLD, perhaps more so than raw amplitudes.

      This paper investigated the impact of AMPA microinjections in the VTA and found reduced ventral striatal functional connectivity, correlation between the delta band and BOLD signal, and phase–amplitude coupling of low-frequency LFP and highfrequency LFP, suggesting changes in low-frequency LFP might modulate the BOLD signal.

      Consistent with our study, we also found that low-frequency LFP is negatively coupled with the BOLD signal, but we did not investigate changes in neurovascular coupling with disturbed neural activity using pharmacological methods, and hence, we did not discuss this paper in our study.

      https://pubmed.ncbi.nlm.nih.gov/24071524/ This paper did the same kind of tests comparing LFP-BOLD correlations to BOLD-BOLD correlations as the paper under review.

      This study examined the neural mechanism underpinning dynamic restingstate fMRI, revealing a spatiotemporal coupling of infra-slow neural activity with a quasiperiodic pattern (QPP). While our current investigation centered on stationary restingstate functional connectivity, we acknowledge that dynamic analysis will provide additional value for investigating the relationship between LFP and rsfMRI signals. This warrants more investigation in a future study. This point has been added to the revised manuscript.

      https://pubmed.ncbi.nlm.nih.gov/24904325/ This paper found that different frequencies of electrophysiology (including ones not studied in the reviewed paper) contribute independently to the BOLD signal

      This paper identified phase-amplitude coupling in rats anesthetized with isoflurane but not with dexmedetomidine, indicating that this coupling arises from a special type of neural activity pattern, burst-suppression, which was probably induced by high-dose isoflurane. They conjectured that high and low-frequency neural activities may independently or differentially influence the BOLD signal. Our study also examined the influence of various LFP frequency bands on the BOLD signal and found inversed LFP-BOLD relationship between low- and high-frequency LFP powers. We also added more results on the analysis of infraslow LFP signals. Regardless, since the reference study did not examine the spatial relationship of LFP and BOLD activities, we cannot comment on how it may provide insight into our results. 

      https://pubmed.ncbi.nlm.nih.gov/26041826/ This paper found electrophysiological correlates within the BOLD signal when using BOLD analysis methods not used in the reviewed paper, and furthermore that some of these correlate with electrophysiological frequencies not studied in the reviewed paper (< 1 Hz).

      We have added more results on the analysis of infraslow LFP signals and acknowledged the value of dynamic rsfMRI analysis in studies of BOLDelectrophysiology relationship.

      I am not saying the authors need to use all these methods or even cite these papers. As I stated in their review, they merely need to (1) cite some of the most relevant for the proper context, the above list can maybe help (2) remove the claim of an "electrophysiology invisible signal" (3) use terms more commonly used in these papers for the extent of correlation with the electrode, other than "spatial variance."

      We thank the reviewer again for providing a detailed list of reference studies. We have added the related discussion to the revised manuscript as described above.

      Comment 2: The abstract entirely and much of the rest of the paper should be rewritten to be more reasonable. The authors would do well to review some of the past controversies in this area, e.g. Magri et al. J Neurosci. 2012 Jan 25;32(4):1395-407.

      We have made significant revision to improve the writing of the paper. The reference paper has been added to the revised manuscript.

      Comment 3: This should be re-written and the terminology used here should be chosen more carefully.

      The writing of the manuscript has been improved with more careful choice of terminology.    

      Major method problem:

      Comment 4: At a minimum, the authors should be transforming the uniform distribution of CC results to Z or T values and using randn() instead of rand() in MATLAB.

      Below is the figure illustrating the simulation results by transforming CC values to Z score. Results obtained remain consistent.

      Author response image 2.

      Minor problems:

      Comment 5: "MR-510 compatible electrodes (MRCM16LP, NeuroNexus Inc)"

      Details of this type of electrode are not readily available. But for studies like this one, further information on materials is critical as this determines the frequency coverage, which is not even across all LFP frequencies for all materials. Most commercially prepared electrodes cannot record <1Hz accurately, and this study includes at least 0.11Hz in some of its analysis.

      The type of electrode used in our current study is a silicon-based micromachined probe. These probes are fabricated using photolithographic techniques to pattern thin layers of conductive materials onto a silicon substrate. This probe is capable of recording the LFP activity within a broad frequency range, starting from 0.1Hz . We added this information to the revised manuscript. 

      Comment 6: Grounding to the cerebellum in theory would remove global conduction from the LFP but also global signal regression is done to the fMRI. Does the LFP-rsfMRI correlation change due to the regression or does only the rsfMRI-rsfMRI correlation change?

      The results obtained with global signal regression were consistent with those obtained without it (see Figs. S4-S5), and therefore, we do not believe our results are affected by this preprocessing step. 

      Comment 7. Avoid colloquial language like "on the other hand" etc.

      We used more appropriate language in the revised manuscript.

      References:

      Bolt, T., Nomi, J.S., Bzdok, D., Salas, J.A., Chang, C., Thomas Yeo, B.T., Uddin, L.Q., Keilholz, S.D., 2022. A parsimonious description of global functional brain organization in three spatiotemporal patterns. Nat Neurosci 25, 1093-1103.

      Cabral, J., Fernandes, F.F., Shemesh, N., 2023. Intrinsic macroscale oscillatory modes driving long range functional connectivity in female rat brains detected by ultrafast fMRI. Nat Commun 14, 375.

      Hacker, C.D., Snyder, A.Z., Pahwa, M., Corbetta, M., Leuthardt, E.C., 2017. Frequencyspecific electrophysiologic correlates of resting state fMRI networks. Neuroimage 149, 446-457.

      Kucyi, A., Schrouff, J., Bickel, S., Foster, B.L., Shine, J.M., Parvizi, J., 2018. Intracranial Electrophysiology Reveals Reproducible Intrinsic Functional Connectivity within Human Brain Networks. J Neurosci 38, 4230-4242.

      Li, J.M., Acland, B.T., Brenner, A.S., Bentley, W.J., Snyder, L.H., 2022. Relationships between correlated spikes, oxygen and LFP in the resting-state primate. Neuroimage 247, 118728.

      Ma, Y., Shaik, M.A., Kozberg, M.G., Kim, S.H., Portes, J.P., Timerman, D., Hillman, E.M., 2016. Resting-state hemodynamics are spatiotemporally coupled to synchronized and symmetric neural activity in excitatory neurons. Proc Natl Acad Sci U S A 113, E8463-E8471.

      Ma, Z., Zhang, N., 2018. Temporal transitions of spontaneous brain activity. Elife 7.

      Shi, Z., Wu, R., Yang, P.F., Wang, F., Wu, T.L., Mishra, A., Chen, L.M., Gore, J.C., 2017. High spatial correspondence at a columnar level between activation and resting state fMRI signals and local field potentials. Proc Natl Acad Sci U S A 114, 52535258.

      Thompson, G.J., Pan, W.J., Magnuson, M.E., Jaeger, D., Keilholz, S.D., 2014. Quasiperiodic patterns (QPP): large-scale dynamics in resting state fMRI that correlate with local infraslow electrical activity. Neuroimage 84, 1018-1031.

      Uhlirova, H., Kilic, K., Tian, P., Thunemann, M., Desjardins, M., Saisan, P.A., Sakadzic, S., Ness, T.V., Mateo, C., Cheng, Q., Weldy, K.L., Razoux, F., Vandenberghe, M.,

      Cremonesi, J.A., Ferri, C.G., Nizar, K., Sridhar, V.B., Steed, T.C., Abashin, M.,

      Fainman, Y., Masliah, E., Djurovic, S., Andreassen, O.A., Silva, G.A., Boas, D.A., Kleinfeld, D., Buxton, R.B., Einevoll, G.T., Dale, A.M., Devor, A., 2016. Cell type specificity of neurovascular coupling in cerebral cortex. Elife 5.

      Vafaii, H., Mandino, F., Desrosiers-Gregoire, G., O'Connor, D., Markicevic, M., Shen, X.,

      Ge, X., Herman, P., Hyder, F., Papademetris, X., Chakravarty, M., Crair, M.C., Constable, R.T., Lake, E.M.R., Pessoa, L., 2024. Multimodal measures of spontaneous brain activity reveal both common and divergent patterns of cortical functional organization. Nat Commun 15, 229.

  2. Jul 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      In addition to our responses to reviewer suggestions below, a minor bug in the calculation of CAIS was brought to our attention by a reader of our preprint. We have corrected this bug and rerun analyses, whose results became slightly stronger as noise was removed. While we were doing that, someone pointed out to us that our equations were almost the same as Kullback-Leibler divergence, which explains why our metric performed so well. We have made the numerically trivial (see before vs. after image below) mathematical change to use Kullback-Leibler divergence instead, and now have a better story, with a solid basis in information theory, as to why CAIS works.

      Author response image 1.

      Unfortunately, we discovered a second bug that caused our PIC correction code to fail to perform the needed correction for phylogenetic confounding. The previously reported correlation between CAIS (or ENC) with body mass no longer survives PIC-correction. We have therefore removed this analysis from the manuscript. Our story now stands more on the theoretical basis of CAIS and ENC than on the post facto validation than it previously did. We now also present CAIS and ENC on a more equal footing. ENC results are slightly stronger, while CAIS has the complementary advantage of correcting for amino acid frequencies.

      The work involved in these changes, as well as some of the responses to reviews below, justifies changing the second author into a co-first author, and adding an additional coauthor (Hanon McShea) who discovered the second bug.

      Reviewer #1 (Public Review): 

      In this manuscript, the authors propose a new codon adaptation metric, Codon Adaptation Index of Species (CAIS), which they present as an easily obtainable proxy for effective population size. To permit between-species comparisons, they control for both amino acid frequencies and genomic GC content, which distinguishes their approach from existing ones. Having confirmed that CAIS negatively correlates with vertebrate body mass, as would be expected if small-bodied species with larger effective populations experience more efficient selection on codon usage, they then examine the relationship between CAIS and intrinsic structural disorder in proteins. 

      The idea of a robust species-level measure of codon adaptation is interesting. If CAIS is indeed a reliable proxy for the effectiveness of selection, it could be useful to analyze species without reliable life history- or mutation rate data (which will apply to many of the genomes becoming available in the near future). 

      A key question is whether CAIS, in fact, measures adaptation at the codon level. Unfortunately, CAIS is only validated indirectly by confirming a negative correlation with body mass. As a result, the observations about structural disorder are difficult to evaluate. 

      As discussed in the preamble above, we have replaced the body mass validation with a stronger theoretical basis in information theory.

      A potential problem is that differences in GC between species are not independent of life history. Effective population size can drive compositional differences due to the effects of GC-biased gene conversion (gBGC). As noted by Galtier et al. (2018), genomic GC correlates negatively with body mass in mammals and birds. It would therefore be important to examine how gBGC might affect CAIS, and to what extent it could explain the relationship between CAIS and body mass. 

      Suppose that gBGC drives an increase in GC that is most pronounced at 3rd codon positions in highrecombination regions in small-bodied species. In this case, could observed codon usage depart more strongly from expectations calculated from overall genomic GC in small vertebrates compared to large ones? The authors also report that correcting for local intergenic GC was unsuccessful, based on the lack of a significant negative relationship with body mass (Figure 3D). In principle, this could also be consistent with local GC providing a relatively more appropriate baseline in regions with high recombination rates. Considering these scenarios would clarify what exactly CAIS is capturing. 

      Figure 3 (previously Supplementary Figures S5A and S5B) shows that CAIS is negligibly correlated with %GC (not robust to multiple comparisons correction), and ENC not at all. We believe this is evidence against the possibility brought up by the reviewer, i.e. that Ne might affect gBGC (and hence global %GC). This relationship, if present, could act as a confounding effect, but it is not present within our species dataset. 

      Note that we expect our genomic-GC-based codon usage expectations to reflect unchecked gBGC in an average genomic region, independently of whether that species has high or low Ne. Our working model is that non-selective forces, include gBGC as well as conventional mutation biases, vary among species, and that they rather than selection determine each species’ genome-wide %GC. By correcting for genome-wide %GC, CAIS and ENC correct for both mutation bias and gBGC, in order to isolate the effects of selection.

      This argument, based on an average genomic region, is vulnerable to gene-rich genomic regions having differentially higher recombination rates and hence GC-biased gene conversion. However, we do not see the expected positive correlation between |𝐥𝐨𝐜𝐚𝐥 𝐆𝐂 - global GC| and CAIS (see new Figure 5), again suggesting that gene conversion strength is not a confounding factor acting on CAIS.

      Given claims about "exquisitely adapted species", the case for using CAIS as a measure of codon adaptation would also be stronger if a relationship with gene expression could be demonstrated. RSCU is expected to be higher in highly expressed genes. Is there any evidence that the equivalent GCcontrolled measure behaves similarly? 

      Correlations with gene expression are outside the scope of the current work, which is focused on producing and exploiting a single value of codon adaptation per species. It is indeed possible that our general approach of using Kullback-Leibler divergence to correct for genomic %GC could be useful in future work investigating differences among genes.  

      The manuscript is overall easy to follow, though some additional context may be helpful for the general reader. A more detailed discussion of how this work compares to the approach taken by Galtier et al. (2018), which accounted for GC content and gBGC when examining codon preferences, would be appropriate, for example. In addition, it would have been useful to mention past work that has attempted to explicitly quantify selection on codon usage. 

      One key difference between our work and that of Galtier et al. 2018 is that our approach does not rely on identifying specific codon preferences as a function of species. Our approach might therefore be robust to scenarios where different genes have different codon preferences (see Gingold et al. 2014 https://doi.org/10.1016/j.cell.2014.08.011). At a high level, our results are in broad agreement with those of Galtier et al., 2018, who found that gBGC affected all animal species, regardless of Ne, and who like us, found that the degree of selection on codon usage depended on Ne.

      Reviewer #2 (Public Review): 

      Summary 

      The goal of the authors in this study is to develop a more reliable approach for quantifying codon usage such that it is more comparable across species. Specifically, the authors wish to estimate the degree of adaptive codon usage, which is potentially a general proxy for the strength of selection at the molecular level. To this end, the authors created the Codon Adaptation Index for Species (CAIS) that controls for differences in amino acid usage and GC% across species. Using their new metric, the authors find a previously unobserved negative correlation between the overall adaptiveness of codon usage and body size across 118 vertebrates. As body size is negatively correlated with effective population size and thus the general strength of natural selection, the negative correlation between CAIS and body size is expected. The authors argue this was previously unobserved due to failures of other popular metrics such as Codon Adaptation Index (CAI) and the Effective Number of Codons (ENC) to adequately control for differences in amino acid usage and GC content across species. Most surprisingly, the authors also find a positive relationship between CAIS and the overall "disorderedness" of a species protein domains. As some of these results are unexpected, which is acknowledged by the authors, I think it would be particularly beneficial to work with some simulated datasets. I think CAIS has the potential to be a valuable tool for those interested in comparing codon adaptation across species in certain situations. However, I have certain theoretical concerns about CAIS as a direct proxy for the efficiency of selection $sN_e$ when the mutation bias changes across species.  

      Strengths 

      (1) I appreciate that the authors recognize the potential issues of comparing CAI when amino acid usage varies and correct for this in CAIS. I think this is sometimes an under-appreciated point in the codon usage literature, as CAI is a relative measure of codon usage bias (i.e. only considers synonyms). However, the strength of natural selection on codon usage can potentially vary across amino acids, such that comparing mean CAI between protein regions with different amino acid biases may result in spurious signals of statistical significance (see Cope et al. Biochemica et Biophysica Acta - Biomembranes 2018 for a clear example of this). 

      We now cite Cope et al. as an example of how amino acid composition can act as a confounding factor.

      (2) The authors present numerous analysis using both ENC and mean CAI as a comparison to CAIS, helping given a sense of how CAIS corrects for some of the issues with these other metrics. I also enjoyed that they examined the previously unobserved relationship between codon usage bias and body size, which has bugged me ever since I saw Kessler and Dean 2014. The result comparing protein disorder to CAIS was particularly interesting and unexpected. 

      Unfortunately, our previous PIC correction code was buggy, and in fact the relationship with body size does not survive PIC correction (although it is strong prior to PIC correction). We have therefore removed it from the paper. However, the more novel result on protein disorder remains strong.

      (3) The CAIS metric presented here is generally applicable to any species that has an annotated genome with protein-coding sequences. 

      Weaknesses 

      (1) The main weakness of this work is that it lacks simulated data to confirm that it works as expected. This would be particularly useful for assessing the relationship between CAIS and the overall effect of protein structure disorder, which the authors acknowledge is an unexpected result. I think simulations could also allow the authors to assess how their metric performs in situations where mutation bias and natural selection act in the same direction vs. opposite directions. Additionally, although I appreciate their comparisons to ENC and mean CAI, the lack of comparison to other popular codon metrics for calculating the overall adaptiveness of a genome (e.g. dos Reis et al.'s $S$ statistic, which is a function of tRNA Adaptation Index (tAI) and ENC) may be more appropriate. Even if results are similar to $S$, CAIS has a noted advantage that it doesn't require identifying tRNA gene copy numbers or abundances, which I think are generally less readily available than genomic GC% and protein-coding sequences. 

      The main limitation of dos Reis’s test in our view is that, like the better versions of CAI, it requires comparable orthologs across species. See also the discussion below re the benefits of proteome-wide approach. We now also note the advantage of not needing tRNA gene copy numbers and abundances. 

      Simulated datasets would be great, but we think it a nice addition rather than must-have, in particular because we are skeptical about whether our understanding of all relevant processes is good enough such that simulations would add much to our more heuristic argument along the lines of Figure 2. E.g. the complications of Gingold et al. 2014 cited above are pertinent, but incorporating them would make simulations quite involved. Instead, we now have a stronger theoretical justification for CAIS grounded in information theory. We have significantly expanded discussion of Figure 2 to give a clearer idea of the conceptual underpinnings of CAIS and ENC.

      The authors mention the selection-mutation-drift equilibrium model, which underlies the basic ideas of this work (e.g. higher $N_e$ results in stronger selection on codon usage), but a more in-depth framing of CAIS in terms of this model is not given. I think this could be valuable, particularly in addressing the question "are we really estimating what we think we're estimating?" 

      Let's take a closer look at the formulation for RSCUS. From here on out, subscripts will only be used to denote the codon and it will be assumed that we are only considering the case of $r = genome$ for some species $s$. 

      \begin{align*} 

      RSCUS_i &= \frac{O_i}{E_i} 

      \end{align*} 

      I think what the authors are attempting to do is "divide out" the effects of mutation bias (as given by $E_i$), such that only the effects of natural selection remain, i.e. deviations from the expected frequency based on mutation bias alone represent adaptive codon usage. Consider Gilchrist et al. MBE 2015, which says that the expected frequency of codon $i$ at selection-mutation-drift equilibrium in gene $g$ for an amino acid with $N_a$ synonymous codons is 

      \begin{align} 

      E_{i,g} &= \frac{e^{-\Delta M_i-\Delta\eta_i\phi_g}}{\sum_{j=1}^{N_a}e^{-\Delta M_j\Delta\eta_j\phi_g}} 

      \end{align} 

      where$\Delta M$ is the mutation bias, $\Delta\eta$ is the strength of selection scaled by the strength of drift, and $\phi_g$ is the gene expression level of gene $g$. In this case, $\Delta M$ and $\Delta\eta$ reflect the strength and direction of mutation bias and natural selection relative to a reference codon, for which $\Delta M_ref, \Delta\eta_ref = 0$. Assuming the selection-mutation-drift equilibrium model is generally adequate to model the true codon usage patterns in a genome (as I do and I think the authors do, too), the $E_{i,g}$ could be considered the expected observed frequency codon $i$ in gene $g$ $E[O_{i,g}]$. 

      Let's re-write the $E_i = \frac{p_i}{\sum_{j=1}^{N_a}p_j}$ in the form of Gilchrist et al., such that it is a function of mutation bias $\Delta M$. For simplicity, we will consider just the two-codon case and assume the amino acid sequence is fixed. Assuming GC% is at equilibrium, the term $g_r$ and $1 - g_r$ can be written as 

      \begin{align*} 

      g_r &= \frac{\mu_{AT\rightarrow GC}}{\mu_{AT\rightarrow GC} + \mu_{GC\rightarrow AT}} \1 - g_r & = \frac{\mu_{GC\rightarrow AT}}{\mu_{AT\rightarrow GC} + \mu_{GC\rightarrow AT}}  \end{align*} 

      where $\mu_{x\rightarrow y}$ is the mutation rate from nucleotides $x$ to $y$. As described in Gilchrist et al. MBE 2015 and Shah and Gilchrist PNAS 2011, the mutation bias $\Delta M_{NNA,NNG} = log(\frac{\mu_{AT\rightarrow GC}}{\mu_{GC\rightarrow AT}})$. This can be expressed in terms of the equilibrium GC content by recognizing that 

      \begin{align*} 

      \frac{g_r}{1-g_r} &= \frac{\mu_{AT\rightarrow GC}}{\mu_{GC\rightarrow AT}} \\implies \frac{g_r}{1-g_r} &= e^{\Delta M} 

      \end{align*} 

      As we are assuming the amino acid sequence is fixed, the probability of observing a synonymous codon $i$ at an amino acid becomes just a Bernoulli process. 

      \begin{align*} 

      p_i &= g_r^x(1-g_r)^{(1-x)} 

      \end{align*} 

      If we do this, then 

      \begin{align} 

      E_{NNA} &= \frac{p_{NNA}}{p_{NNA} + p_{NNG}} \ &= \frac{1-g_r}{g_r + (1-g_r)} \ &=

      \frac{1}{\frac{g_r}{1-g_r} + 1} \ &= \frac{1}{e^{\Delta M} + 1} \ & = \frac{e^{-\Delta M}}{1 + e^{-\Delta M}} 

      \end{align} 

      Recall that in the Gilchrist et al. framework, the reference codon has $\Delta M_{NNG,NNG} = 0 \implies e^{-\Delta M_{NNG,NNG}} = 1$. Thus, we have recovered the Gilchrist et al. model from the formulation of $E_i$ under the assumption that natural selection has no impact on codon usage and codon NNG is the pre-defined reference codon. To see this, plug in 0 for $\Delta\eta$ in equation (1). 

      We can then calculate the expected RSCUS using equation (1) (using notation $E[O_i]$) and equation (6) for the two codon case. For simplicity assume, we are only considering a gene of average expression (defined as $\phi_g = 1$). 

      Assume in this case that NNG is the reference codon ($\Delta M_{NNG},\Delta\eta_{NNG} = 0$). 

      \begin{align} 

      E[RSCUS_{NNA}] &= \frac{E[O_{NNA}]}{E_{NNA}} \ &= \frac{e^{-\Delta\eta_{NNA}}(e^{-\Delta

      M_{NNA}}+e^{-\Delta M_{NNG}})}{e^{-\Delta M_{NNA}-\Delta\eta_{NNA}} + e^{-\Delta M_{NNG}\Delta\eta_{NNG}}} \ & = \frac{e^{-\Delta M_{NNA} - \Delta\eta_{NNA}} + e^{-\Delta M_{NNG} -

      \Delta\eta_{NNA}}}{e^{-\Delta M_{NNA} - \Delta\eta_{NNA}} + e^{-\Delta M_{NNG} -

      \Delta\eta_{NNG}}} \ &= \frac{e^{-\Delta M_{NNA} - \Delta\eta_{NNA}} + e^{- \Delta\eta_{NNA}}}{e^{\Delta M_{NNA} - \Delta\eta_{NNA}} + 1} 

      \end{align} 

      This shows that the expected value of RSCUS for a two-codon amino acid is expected to increase as the strength of selection $\Delta\eta$ increases, which is desired. Note that $\Delta\eta$ in Gilchrist et al. is formulated in terms of selection *against* a codon relative to the reference, such that a negative value represents that a codon is favored relative to the reference. If $\Delta\eta = 0$ (i.e. selection does not favor either codon), then $E[RSCUS] = 1$. Also note that the expected RSCUS does not remain independent of the mutation bias. This means that even if $sN_e$ (i.e. the strength of natural selection) does not change between species, changes to the strength and direction of mutation bias across species could impact RSCUS. Assuming my math is right, I think one needs to be cautious when interpreting CAIS as representative of the differences in the efficiency of selection across species except under very particular circumstances. One such case could be when it is known that mutation bias varies little across the species of interest. Looking at the species used in this manuscript, most of them have a GC content ranging around 0.41, so I suspect their results are okay. 

      Although I have not done so, I am sure this could be extended to the 4 and 6 codon amino acids. 

      We thank Reviewer 2 for explicitly laying out the math that was implicit in our Figures 1 and 2. While we keep our more heuristic presentation, our revised manuscript now more clearly acknowledges that the per-site codon adaptation bias depicted in Figure 1 has limited sensitivity to s*Ne. The reason that we believe our approach worked despite this, is that we think the phenomenon is driven by what is shown in Figure 2. I.e., where Ne makes a difference is by determining the proteome-wide fraction of codons subject to significant codon adaptation, rather than by determining the strength of codon adaptation at any particular site or gene. We have made multiple changes to the texts to make this point clearer.

      Another minor weakness of this work is that although the method is generally applicable to any species with an annotated genome and the code is publicly available, the code itself contains hard-coded values for GC% and amino acid frequencies across the 118 vertebrates. The lack of a more flexible tool may make it difficult for less computationally-experienced researchers to take advantage of this method. 

      Genome-wide %GC values are hard-coded because they were taken from the previous study of James et al. (2023) https://doi.org/10.1093/molbev/msad073. As summarized in the manuscript, genome-wide %GC was a byproduct of a scan of all six reading frames across genic and intergenic sequences available from NCBI with access dates between May and July 2019. The more complicated code used to calculate the intergenic %GC, and the code used to calculate amino acid frequencies is located at https://github.com/MaselLab/CodonAdaptation-Index-of-Species. Luckily, someone else just wrote a simpler end to end pipeline for us, on the basis of our preprint. We now note this in the Acknowledgements, and link to it: https://github.com/gavinmdouglas/handy_pop_gen/blob/main/CAIS.py.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      Summary: The global decline of amphibians is primarily attributed to deadly disease outbreaks caused by the chytrid fungus, Batrachochytrium dendrobatidis (Bd). It is unclear whether and how skin-resident immune cells defend against Bd. Although it is well known that mammalian mast cells are crucial immune sentinels in the skin and play a pivotal role in immune recognition of pathogens and orchestrating subsequent immune responses, the roles of amphibian mast cells during Bd infections is largely unknown. The current study developed a novel way to enrich X. laevis skin mast cells by injecting the skin with recombinant stem cell factor (SCF), a KIT ligand required for mast cell differentiation and survival. The investigators found an enrichment of skin mast cells provides X. laevis substantial protection against Bd and mitigates the inflammation-related skin damage resulting from Bd infection. Additionally, the augmentation of mast cells leads to increased mucin content within cutaneous mucus glands and shields frogs from the alterations to their skin microbiomes caused by Bd. 

      Strengths: This study underscores the significance of amphibian skin-resident immune cells in defenses against Bd and introduces a novel approach to examining interactions between amphibian hosts and fungal pathogens. 

      We thank the reviewer for recognizing the significance and the novelty of our work.

      Weaknesses: The main weakness of the study is lack of functional analysis of X. laevis mast cells. Upon activation, mast cells have the characteristic feature of degranulation to release histamine, serotonin, proteases, cytokines, and chemokines, etc. The study should determine whether X. laevis mast cells can be degranulated by two commonly used mast cell activators IgE and compound 48/80 for IgE-dependent and independent pathway. This can be easily done in vitro. It is also important to assess whether in vivo these mast cells are degranulated upon Bd infection using avidin staining to visualize vesicle releases from mast cells. Figure 3 only showed rSCF injection caused an increase in mast cells in naïve skin. They need to present whether Bd infection can induce mast cell increase and rSCF injection under Bd infection causes a mast cell increase in the skin. In addition, it is unclear how the enrichment of mast cells provides the protection against Bd infection and alternations to skin microbiomes after infection. It is important to determine whether skin mast cell release any contents mentioned above. 

      We would like to thank the reviewer for taking the time to review our work and providing us with valuable feedback.

      Please note, that as indicated in our previous rebuttal to reviewers, amphibians do not possess the IgE antibody isotype1.

      To our knowledge, there are no published works describing the approaches used in studying mammalian mast cell degranulation towards examining amphibian mast cells. While there are commercially available kits and reagents for examining mammalian mast cell granule content, most of these do not cross-react with amphibian counterparts. This is especially true of cytokines and chemokines, which diverged quickly with evolution and thus do not share substantial protein sequence identity across species as diverged as frogs and mammals. We would also like to highlight the fact that several studies suggest that amphibian mast cells lack histamine2, 3, 4, 5 and serotonin2, 6. While following up on these findings would be possible, we would like to respectfully emphasize that adopting approaches used in mammalian research to comparative immunology work is not always straightforward.

      As we highlight in our manuscript, frog mast cells upregulate their expression of interleukin-4 (IL4), a hallmark cytokine associated with mammalian mast cells7. The additional findings presented in our revised manuscript indicate that mast cells respond to Bd by upregulating IL4 expression in vitro and in vivo. Together, this suggests that IL4 may be a central means by which frog mast cells confer protection against Bd, by counteracting Bd-elicited inflammation, including minimizing neutrophil infiltration, maintaining skin integrity, and promoting cutaneous mucus production. Please find that these additional results are presented in Figure 8 and are described in the results and discussion sections of our revised manuscript.

      Our attempts to elicit degranulation of frog mast cells using compound 48/80 have so far not been successful. This may reflect technical issues with assays optimized for mammalian mast cells or biological difference between frog and mammalian mast cells, such as species differences in mas-related G-protein coupled receptors, through which compound 48/80 acts8. We will continue to explore means to study frog mast cell degranulation both in vitro and in vivo but also respectfully point out that while degranulation is a feature commonly associated with mammalian mast cells, this is not the only means by which the mammalian mast cells confer their immunological effects. Indeed, our studies suggest that frog mast cell IL4 production may be a key means by which these cells offer anti-Bd protection.

      Please note that we successfully adopted an avidin staining approach to visualize mast cell heparin content in vitro and to evaluate cutaneous mast cell numbers in vivo in control and mast cell-enriched, mock- and Bd-infected animals. This additional work is depicted in Figure 4 and addressed in the results and discussion sections of our revised manuscript.

      Reviewer #2 (Public Review):

      Summary: In this study, Hauser et al investigate the role of amphibian (Xenopus laevis) mast cells in cutaneous immune responses to the ecologically important pathogen Batrachochytrium dendrobatidis (Bd) using novel methods of in vitro differentiation of bone marrow-derived mast cells and in vivo expansion of skin mast cell populations. They find that bone marrow-derived myeloid precursors cultured in the presence of recombinant X. laevis Stem Cell Factor (rSCF) differentiate into cells that display hallmark characteristics of mast cells. They inject their novel (r)SCF reagent in the skin of X. laevis and find that this stimulates expansion of cutaneous mast cell populations in vivo. They then apply this model of cutaneous mast cell expansion in the setting of Bd infection and find that mast cell expansion attenuates skin burden of Bd zoospores and pathologic features including epithelial thickness and improves protective mucus production and transcriptional markers of barrier function. Utilizing their prior expertise with expanding neutrophil populations in X. laevis, the authors compare mast cell expansion using (r)SCF to neutrophil expansion using recombinant colony stimulating factor 3 (rCSF3) and find that neutrophil expansion in Bd infection leads to greater burden of zoospores and worse skin pathology. Combining these two observations, they demonstrate that mast cell expansion using rSCF attenuates cutaneous neutrophilic infiltration. They further show that mast cell expansion correlates to cutaneous IL-4 expression, and that treatment with exogenous rIL-4 reduces neutrophilic infiltration and restores markers of epithelial health, offering a mechanism by which mast cell expansion protects from Bd infection. 

      Strengths: The authors report a novel method of expanding amphibian mast cells utilizing their custom-made rSCF reagent. They rigorously characterize expanded mast cells in vitro and in vivo using histologic, morphologic, transcriptional, and functional assays. This establishes solid footing with which to then study the role of rSCF-stimulated mast cell expansion in the Bd infection model. This appears to be the first demonstration of exogenous use of rSCF in amphibians to expand mast cell populations and may set a foundation for future mechanistic studies of mast cells in the X. laevis model organism. Building on prior work, they are able to contrast mast cell expansion with their neutrophil expansion model, allowing them to infer a mechanistic link between mast cell expansion and IL-4 production and subsequent suppression of neutrophil infiltration and cutaneous dysbiosis. 

      We thank the reviewer for recognizing the rigorousness and utility of the studies presented in our manuscript.

      Weaknesses: The main weaknesses derive from technical limitations inherent to the Xenopus model at this time. For example, in mice a mechanistic study would be expected to use IL-4 knockouts, preferably mast cell-specific, to prove the link between mast cell expansion and IL-4 production being necessary and sufficient to suppress neutrophils. However, the novel reagents in this manuscript present a compelling technical advance and a step forward in the tools available to study amphibian biology. 

      We agree with the reviewer that an IL4 knock-out animal model would be a great way to support our findings. Unfortunately, working with a non-mammalian model such as X. laevis poses limitations that include lack of knock-out lines for immunology research. Moreover, as mentioned in our manuscript, we do not believe that IL4 is the sole mast cell-produced component responsible for the conferred antifungal protection. We thank the reviewer for acknowledging the limitations of our model system and recognizing the novelty, technical advances, and merits of the work presented in our manuscript.

      In addition to their discussion, one open question from the revised manuscript is how a single treatment with rSCF leads to a peak in mast cell numbers and then decline to baseline in mock-infected frogs, while Bd infection either sustains rSCF-boosted mast cells or leads to steady mast cell increase over time in control-treated frogs. Whether this is mediated by endogenous SCF or some other factor remains unexplored.

      This is an interesting question that we hope to explore in future studies. We did not see significant differences in skin SCF gene expression at 21 days post Bd infection. This does not rule out the possibility that the observed Bd-mediated effects to frog skin mast cell composition are not due to changes in skin SCF gene expression at earlier infection times, alone or in combination with other host or pathogen derived factors. We know that other factors are responsible for homing/retention of antimicrobial and immunosuppressive granulocyte subsets within frog skin9 and we postulate that some of these may be distinct mast cell types. Additionally, Bd is known to produce a myriad of immunomodulatory factors10, which may well also directly affect frog skin mast cell composition. Mammalian mast cells are heterogenous and are homed or recruited into tissues by an extensive array of host as well as microbiome-derived components11, 12. Undoubtedly, the frog skin mast cell composition is likewise complex, dynamic, and contingent on a plethora of host, cutaneous microbial flora- and in this case also Bd-produced factors.

      References

      (1) Flajnik, M.F. A cold-blooded view of adaptive immunity. Nat Rev Immunol 18, 438-453 (2018).

      (2) Mulero, I., Sepulcre, M.P., Meseguer, J., Garcia-Ayala, A. & Mulero, V. Histamine is stored in mast cells of most evolutionarily advanced fish and regulates the fish inflammatory response. Proc Natl Acad Sci U S A 104, 19434-19439 (2007).

      (3) Reite, O.B. A phylogenetical approach to the functional significance of tissue mast cell histamine. Nature 206, 1334-1336 (1965).

      (4) Reite, O.B. Comparative physiology of histamine. Physiol Rev 52, 778-819 (1972).

      (5) Takaya, K., Fujita, T. & Endo, K. Mast cells free of histamine in Rana catasbiana. Nature 215, 776-777 (1967).

      (6) Galli, S.J. New insights into "the riddle of the mast cells": microenvironmental regulation of mast cell development and phenotypic heterogeneity. Lab Invest 62, 5-33 (1990).

      (7) Babina, M., Guhl, S., Artuc, M. & Zuberbier, T. IL-4 and human skin mast cells revisited: reinforcement of a pro-allergic phenotype upon prolonged exposure. Archives of dermatological research 308, 665-670 (2016).

      (8) Hermans, M.A.W. et al. Human Mast Cell Line HMC1 Expresses Functional Mas-Related G-Protein Coupled Receptor 2. Front Immunol 12, 625284 (2021).

      (9) Hauser, K. et al. Discovery of granulocyte-lineage cells in the skin of the amphibian Xenopus laevis. FACETS 5, 571 (2020).

      (10) Rollins-Smith, L.A. & Le Sage, E.H. Batrachochytrium fungi: stealth invaders in amphibian skin. Curr Opin Microbiol 61, 124-132 (2021).

      (11) Halova, I., Draberova, L. & Draber, P. Mast cell chemotaxis - chemoattractants and signaling pathways. Front Immunol 3, 119 (2012).

      (12) West, P.W. & Bulfone-Paus, S. Mast cell tissue heterogeneity and specificity of immune cell recruitment. Front Immunol 13, 932090 (2022).


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The global decline of amphibians is primarily attributed to deadly disease outbreaks caused by the chytrid fungus, Batrachochytrium dendrobatidis (Bd). It is unclear whether and how skin-resident immune cells defend against Bd. Although it is well known that mammalian mast cells are crucial immune sentinels in the skin and play a pivotal role in the immune recognition of pathogens and orchestrating subsequent immune responses, the roles of amphibian mast cells during Bd infections are largely unknown. The current study developed a novel way to enrich X. laevis skin mast cells by injecting the skin with recombinant stem cell factor (SCF), a KIT ligand required for mast cell differentiation and survival. The investigators found an enrichment of skin mast cells provides X. laevis substantial protection against Bd and mitigates the inflammation-related skin damage resulting from Bd infection. Additionally, the augmentation of mast cells leads to increased mucin content within cutaneous mucus glands and shields frogs from the alterations to their skin microbiomes caused by Bd.

      Strengths:

      This study underscores the significance of amphibian skin-resident immune cells in defenses against Bd and introduces a novel approach to examining interactions between amphibian hosts and fungal pathogens. 

      We thank the reviewer for acknowledging the novelty and importance of the work presented in our manuscript.

      Weaknesses:

      The main weakness of the study is the lack of functional analysis of X. laevis mast cells. Upon activation, mast cells have the characteristic feature of degranulation to release histamine, serotonin, proteases, cytokines, and chemokines, etc. The study should determine whether X. laevis mast cells can be degranulated by two commonly used mast cell activators IgE and compound 48/80 for IgE-dependent and independent pathways. This can be easily done in vitro. It is also important to assess whether in vivo these mast cells are degranulated upon Bd infection using avidin staining to visualize vesicle releases from mast cells. Figure 3 only showed rSCF injection caused an increase in mast cells in naïve skin. They need to present whether Bd infection can induce mast cell increase and rSCF injection under Bd infection causes a mast cell increase in the skin. In addition, it is unclear how the enrichment of mast cells provides protection against Bd infection and alternations to skin microbiomes after infection. It is important to determine whether skin mast cells release any contents mentioned above. 

      We would like to thank the reviewer for taking the time to review our work and providing us with valuable feedback. We feel that we have successfully incorporated the reviewer’s suggestions into our revised manuscript, thereby improving this work.

      Please note that amphibians do not possess the IgE antibody isotype1.

      To our knowledge there have been no published work assimilating approaches used when studying mammalian mast cell degranulation towards examining amphibian mast cells. While there are commercially available kits and reagents for examining mammalian mast cell granule content, most of these reagents do not cross-react with amphibian counterparts. This is especially true of cytokines and chemokines, which diverged quickly with evolution and thus do not share substantial protein sequence identity across species as diverged as frogs and mammals. Additionally, several studies suggest that amphibian mast cells lack histamine2, 3, 4, 5 and serotonin2, 6. Respectfully, while following up on these findings is possible, we would not consider adopting approaches used in mammalian research to comparative immunology work as easy.

      As noted in our manuscript, frog mast cells upregulate their expression of interleukin-4 (IL4), which is a hallmark cytokine associated with mammalian mast cells7. The additional findings, presented in our revised manuscript indicate that mast cells respond to Bd by upregulating IL4 expression in vitro and in vivo. In turn, our work indicates that IL4 may be a central means by which frog mast cells confer protection against Bd, by counteracting Bd-elicited inflammation, including minimizing neutrophil infiltration, maintaining skin integrity, and promoting mucus production by skin mucus glands. Please find that these additional findings are presented in Figure 8 of our revised manuscript and are described in the results and discussion sections of the paper.

      Our attempts to elicit degranulation of frog mast cells using compound 48/80 have so far not been successful. This may reflect technical issues with assays optimized for mammalian mast cells or biological difference between frog and mammalian mast cells, such as species differences in mas-related G-protein coupled receptors, through which compound 48/80 acts8. We will continue explore means to study frog mast cell degranulation both in vitro and in vivo but would also like to respectfully point out that while mast cell degranulation is a feature most associated with mammalian mast cells, this is not the only means by which the mammalian mast cells confer their immunological effects. Indeed, our additional studies suggest that mast cell IL4 production may be a key means by which these cells offer anti-Bd protection.

      Please find that we have adopted an avidin-staining approach to visualize mast cell heparin content in vitro and to evaluate mast cell numbers in vivo in the skins of control and mast cell-enriched, mock- and Bd-infected animals. This additional work is depicted in Figure 4 of our revised manuscript and addressed in the results and discussion sections of our revised paper.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Hauser et al investigate the role of amphibian (Xenopus laevis) mast cells in cutaneous immune responses to the ecologically important pathogen Batrachochytrium dendrobatidis (Bd) using novel methods of in vitro differentiation of bone marrow-derived mast cells and in vivo expansion of skin mast cell populations. They find that bone marrow-derived myeloid precursors cultured in the presence of recombinant X. laevis Stem Cell Factor (rSCF) differentiate into cells that display hallmark characteristics of mast cells. They inject their novel (r)SCF reagent into the skin of X. laevis and find that this stimulates the expansion of cutaneous mast cell populations in vivo. They then apply this model of cutaneous mast cell expansion in the setting of Bd infection and find that mast cell expansion attenuates the skin burden of Bd zoospores and pathologic features including epithelial thickness and improves protective mucus production and transcriptional markers of barrier function. Utilizing their prior expertise with expanding neutrophil populations in X. laevis, the authors compare mast cell expansion using (r)SCF to neutrophil expansion using recombinant colony-stimulating factor 3 (rCSF3) and find that neutrophil expansion in Bd infection leads to greater burden of zoospores and worse skin pathology. 

      Strengths:

      The authors report a novel method of expanding amphibian mast cells utilizing their custom-made rSCF reagent. They rigorously characterize expanded mast cells in vitro and in vivo using histologic, morphologic, transcriptional, and functional assays. This establishes solid footing with which to then study the role of rSCF-stimulated mast cell expansion in the Bd infection model. This appears to be the first demonstration of the exogenous use of rSCF in amphibians to expand mast cell populations and may set a foundation for future mechanistic studies of mast cells in the X. laevis model organism. 

      We thank the reviewer for recognizing the breadth and extent of the undertaking that culminated in this manuscript. Indeed, this manuscript would not have been possible without considerable reagent development and adaptation of techniques that had previously not been used for amphibian immunity research. In line with the reviewer’s sentiment, to our knowledge this is the first report of using molecular approaches to augment amphibian mast cells, which we hope will pave the way for new areas of research within the fields of comparative immunology and amphibian disease biology.

      Weaknesses:

      The conclusions regarding the role of mast cell expansion in controlling Bd infection would be stronger with a more rigorous evaluation of the model, as there are some key gaps and remaining questions regarding the data. For example: 

      (1) Granulocyte expansion is carefully quantified in the initial time courses of rSCF and rCSF3 injections, but similar quantification is not provided in the disease models (Figures 3E, 4G, 5D-G). A key implication of the opposing effects of mast cell vs neutrophil expansion is that mast cells may suppress neutrophil recruitment or function. Alternatively, mast cells also express notable levels of csfr3 (Figure 2) and previous work from this group (Hauser et al, Facets 2020) showed rG-CSF-stimulated peritoneal granulocytes express mast cell markers including kit and tpsab1, raising the question of what effect rCSF3 might have on mast cell populations in the skin. Considering these points, it would be helpful if both mast cells and neutrophils were quantified histologically (based on Figure 1, they can be readily distinguished by SE or Giemsa stain) in the Bd infection models. 

      We thank the reviewer for this insightful suggestion. Please find that we successfully adopted an in situ hybridization approach to evaluate neutrophil numbers in the skins of control and mast cell-enriched, mock- and Bd-infected animals based on expression of the neutrophil marker, myeloperoxidase (MPO9).  Please find these results are presented in Figures 6 and 8 of our revised manuscript and addressed in the appropriate sections of our revised paper.

      Our findings suggest that rSCF administration results in the accumulation of mast cells that are polarized such, that they ablate the inflammatory response elicited by Bd infection, such as through mechanisms like IL4 production. Mammalian mast cells, including peritonea-resident mast cells, express csf3r10, 11. For this reason, we used MPO expression to visualize neutrophil skin infiltration in Figures 6 and 8 of our revised work. While the X. laevis animal model does not permit nearly the degree of immune cell resolution afforded by mammalian animal models, we do know that the adult X. laevis peritonea contain a myriad of immune cell subsets. We anticipate that the high kit expression reported by Hauser et al., 2020 in the rCSF3-recruited peritoneal leukocytes reflects the presence of mast cells therein.

      Please find that we have used avidin-staining and MPO in situ hybridization to respectively visualize and enumerate mast cells and neutrophils in the skin of control and mast cell-enriched, mock- and Bd-infected animals. Indeed, our results show interesting, experimental condition-dependent changes in both the skin neutrophil and mast cell numbers. The results of these additional studies are presented in Figures 4, 6 and 8 of the revised manuscript and addressed in the results and discussions sections of our revised paper.

      (2) Epithelial thickness and inflammation in Bd infection are reported to be reduced by rSCF treatment (Figure 3E, 5A-B) or increased by rCSF3 treatment (Figure 4G) but quantification of these critical readouts is not shown.

      We thank the reviewer for this suggestion. We scored epithelial thickness under the distinct conditions described in our manuscript and presented the quantified data in Figures 5 and 8 of the revised paper.

      (3) Critical time points in the Bd model are incompletely characterized. Mast cell expansion decreases zoospore burden at 21 dpi, while there is no difference at 7 dpi (Figure 3E). Conversely, neutrophil expansion increases zoospore burden at 7 dpi, but no corresponding 21 dpi data is shown for comparison (Figure 4G). Microbiota analysis is performed at a third time point,10 dpi (Figure 5D-G), making it difficult to compare with the data from the 7 dpi and 21 dpi time points. Reporting consistent readouts at these three time points is important to draw solid conclusions about the relationship of mast cell expansion to Bd infection and shifts in microbiota.

      We thank reviewer for noting this discrepancy. Please find that we have repeated our mast cell-enrichment, Bd-challenge studies, examining days 10 and 21 post infection. Our new findings indicate that compared to control animals, mast cell-enrichment does result in significant reduction in Bd loads at both 10 and 21 dpi. The difference in Bd loads between r-ctrl and rSCF-treated animals at 10 dpi corroborates the other parameters that are altered between the two treatment groups at this experimental time point.

      Our question regarding the roles of inflammatory granulocytes/neutrophils during Bd infections was that of ‘how’ rather ‘when’ these cells affect Bd infections.  Thus, and because the central focus of this work was mast cells and not other granulocyte subsets; when we saw that rCSF3-recruited granulocytes adversely affect Bd infections at 7 days, we did not pursue the kinetics of these observations further. We plan to explore the roles of inflammatory mediators and immune cell subsets during the course of Bd infections but feel that these future studies are more peripheral to the central thesis of the present manuscript regarding the roles of frog mast cells during Bd infections.

      (4) Although the effect of rSCF treatment on Bd zoospores is significant at 21 dpi (Figure 3E), bacterial microbiota changes at 21 dpi are not (Figure S3B-C). This discrepancy, how it relates to the bacterial microbiota changes at 10 dpi, and why 7, 10, and 21 dpi time points were chosen for these different readouts (Figure 5F-G), is not discussed.

      Please find that our additional studies indicate that compared to control animals, frog skin mast cell-enrichment results in significant reduction in Bd loads at 10 dpi. This corroborate our other findings including the observation that at 10 dpi, control animals exhibit reduced microbial richness whereas mast cell-enriched frogs were protected from this disruption of their microbiome. The amphibian microbiome serves as a major barrier to these fungal infections12 and we anticipate that Bd-mediated disruption of microbial richness facilitates host skin colonization by this pathogen. In turn, we anticipate that frog mast cells are conferring the observed anti-Bd protection in part by preventing microbial disassembly and thus interfering with optimal Bd colonization and growth on frog skins. Please find that we acknowledge and discuss these notions in our revised manuscript.

      (5) The time course of rSCF or rCSF3 treatments relative to Bd infection in the experiments is not clear. Were the treatments given 12 hours prior to the final analysis point to maximize the effect? For example, in Figure 3E, were rSCF injections given at 6.5 dpi and 20.5 dpi? Or were treatments administered on day 0 of the infection model? If the latter, how do the authors explain the effects at 7 dpi or 21 dpi given mast cell and neutrophil numbers return to baseline within 24 hours after rSCF or rCSF3 treatment, respectively?

      Please find that in our revised manuscript, we underlined the kinetics of our animal treatments and Bd-infections. In brief, for mast cell-enrichment, animals were injected with r-ctrl or rSCF, challenged 12 hours later with Bd and examined after 10 (per reviewers’ suggestions) and 21 days of infection. For neutrophil enrichment, animals were injected with r-ctrl or rCSF3, challenged 12 hours later with Bd and examined after 7 days of infection.

      The title of the manuscript may be mildly overstated. Although Bd infection can indeed be deadly, mortality was not a readout in this study, and it is not clear from the data reported that expanding skin mast cells would ultimately prevent progression to death in Bd infections.

      We acknowledge this point. The revised manuscript will be titled: “Amphibian mast cells: barriers to chytrid fungus infections”.

      Reviewer #3 (Public Review):

      Summary:

      Hauser et al. provide an exceptional study describing the role of resident mast cells in amphibian epidermis that produce anti-inflammatory cytokines that prevent Batrachochytrium dendrobatidis (Bd) infection from causing harmful inflammation, and also protect frogs from changes in skin microbiomes and loss of mucin in glands and loss of mucus integrity that otherwise cause changes to their skin microbiomes. Neutrophils, in contrast, were not protective against Bd infection. Beyond the beautiful cytology and transcriptional profiling, the authors utilized elegant cell enrichment experiments to enrich mast cells by recombinant stem cell factor, or to enrich neutrophils by recombinant colony-stimulating factor-3, and examined respective infection outcomes in Xenopus.

      Strengths:

      Through the use of recombinant IL4, the authors were able to test and eliminate the hypothesis that mast cell production of IL4 was the mechanism of host protection from Bd infection. Instead, impacts on the mucus glands and interaction with the skin microbiome are implicated as the protective mechanism. These results will press disease ecologists to examine the relative importance of this immune defense among species, the influence of mast cells on the skin microbiome and mucosal function, and open the potential for modulating mucosal defense.

      We thank the reviewer for recognizing the utility of the work presented in our manuscript.

      Weaknesses:

      A reduction of bacterial diversity upon infection, as described at the end of the results section, may not always be an "adverse effect," particularly given that anti-Bd function of the microbiome increased. Some authors (see Letourneau et al. 2022 ISME, or Woodhams et al. 2023 DCI) consider these short-term alterations as encoding ecological memory, such that continued exposure to a pathogen would encounter an enriched microbial defense. Regardless, mast cell-initiated protection of the mucus layer may negate the need for this microbial memory defense.

      We thank the reviewer their insightful comment. We have revised our discussion to include this notion.

      While the description of the mast cell location in the epidermal skin layer in amphibians is novel, it is not known how representative these results are across species ranging in chytridiomycosis susceptibility. No management applications are provided such as methods to increase this defense without the use of recombinant stem cell factor, and more discussion is needed on how the mast cell component (abundance, distribution in the skin) of the epidermis develops or is regulated.

      We thank the reviewer for this suggestion. Please find that we have added a paragraph to our revised manuscripts to address possible source(s) of skin mast cells and a statement acknowledging that greater understanding of mast cell biology across distinct amphibian species may be used to develop future strategies for management of amphibian diseases.

      We are very thankful to the reviewer for this excellent suggestion but would like to point out that the work presented in our manuscript was driven by comparative immunology questions more than by conservation biology. As such and considering just how little is known about mast cells outside of mammals; we chose not to speculate too much into possible utilities of altering amphibian skin mast cell composition and instead to focus our discussion on the immediate takeaways of the work presented by our paper.

      References

      (1) Flajnik, M.F. A cold-blooded view of adaptive immunity. Nat Rev Immunol 18, 438-453 (2018).

      (2) Mulero, I., Sepulcre, M.P., Meseguer, J., Garcia-Ayala, A. & Mulero, V. Histamine is stored in mast cells of most evolutionarily advanced fish and regulates the fish inflammatory response. Proc Natl Acad Sci U S A 104, 19434-19439 (2007).

      (3) Reite, O.B. A phylogenetical approach to the functional significance of tissue mast cell histamine. Nature 206, 1334-1336 (1965).

      (4) Reite, O.B. Comparative physiology of histamine. Physiol Rev 52, 778-819 (1972).

      (5) Takaya, K., Fujita, T. & Endo, K. Mast cells free of histamine in Rana catasbiana. Nature 215, 776-777 (1967).

      (6) Galli, S.J. New insights into "the riddle of the mast cells": microenvironmental regulation of mast cell development and phenotypic heterogeneity. Lab Invest 62, 5-33 (1990).

      (7) Babina, M., Guhl, S., Artuc, M. & Zuberbier, T. IL-4 and human skin mast cells revisited: reinforcement of a pro-allergic phenotype upon prolonged exposure. Archives of dermatological research 308, 665-670 (2016).

      (8) Hermans, M.A.W. et al. Human Mast Cell Line HMC1 Expresses Functional Mas-Related G-Protein Coupled Receptor 2. Front Immunol 12, 625284 (2021).

      (9) Buchan, K.D. et al. A transgenic zebrafish line for in vivo visualisation of neutrophil myeloperoxidase. PLoS One 14, e0215592 (2019).

      (10) Aponte-Lopez, A., Enciso, J., Munoz-Cruz, S. & Fuentes-Panana, E.M. An In Vitro Model of Mast Cell Recruitment and Activation by Breast Cancer Cells Supports Anti-Tumoral Responses. Int J Mol Sci 21 (2020).

      (11) Jamur, M.C. et al. Mast cell repopulation of the peritoneal cavity: contribution of mast cell progenitors versus bone marrow derived committed mast cell precursors. BMC Immunol 11, 32 (2010).

      (12) Walke, J.B. & Belden, L.K. Harnessing the Microbiome to Prevent Fungal Infections: Lessons from Amphibians. PLoS Pathog 12, e1005796 (2016).

      Reviewer #2: (Recommendations For The Authors): 

      We thank the reviewer for their excellent suggestions, their time reviewing this work and their help with this manuscript.

      While we were not able to incorporate some of these changes, please find that we have significantly altered our manuscript in accordance with the reviewer’s suggestions from their public review. We feel that we have substantially altered our paper, including providing considerable additional data, supporting the key findings therein.

      (1) The heatmap in Figure 1I appears to be scaled data, similar to Figure 4A, in which case the indicated scale numbers are not correct (e.g. they should be -2 to 2, or -3 to 3) 

      Thank you for the suggestion. Please find that we have changed this figure accordingly.

      (2) For Figure 1, additional curated gene lists might better illustrate the difference in cell types, e.g. include the data for a panel of mast cell genes in a heatmap (mcpt1, tpsab1, etc.) and another panel of curated neutrophil genes (e.g. lyz) in a heatmap. If the authors still have leftover RNA, qPCR verification of some of the critical genes (e.g. kit) would add to the rigor of the analysis, as this study is the foundation of a new method for culturing amphibian mast cells. 

      We thank the reviewer for this suggestion. Unfortunately, we do not have leftover RNA/cDNA and we have not been able to locate mcpt1 or tpsab1 in our DEGs. We anticipate that this issue may stem from the suboptimal annotation of the Xenopus laevis genome. We agree that curating more mast cell/neutrophil genes would be ideal but feel that we have adequately highlighted those genes that are differentially expressed between the two populations in our analysis.

      (3) The presentation of counts in Figure 2 is a bit hard to interpret. Although it is mentioned that everything is statistically significant, explicitly showing statistics for each gene would be better. One possibility would be to use a volcano plot (p-value vs log2 fold change) and highlight the genes shown in Figure 2, potentially with an accompanying heat map to show replicate variability. 

      We thank the reviewer for this suggestion. We entertained presenting the data as volcano plots or heat maps, but in the end felt that the bar graphs better conveyed the information that we are hoping to get across. Please note that the error bars in the bar graph depict the replicate variability. Please also note that to highlight that all the depicted genes were differentially expressed, we italicized the statement in the corresponding figure legend: “All depicted genes were significantly differentially expressed between the two populations”.

      (4) Narratively, it might make more sense to put Figure 4A-C with Figure 3. 

      We thank the reviewer for this suggestion. Please find that we significantly revised most of our figures to better convey the content therein. We combined the content of Figure 4A-C with Figure 5A-C and added data on epidermal thickness under different conditions into this figure; Figure 5 of our revised manuscript.

      (5) If possible, complementing the skin RNA-seq from rSCF treatment in Bd infection with skin RNA-seq from rCSF3 treatment to compare effects on transcriptional programs of barrier function, etc would elevate this study and add additional insights into cutaneous inflammation in the setting of Bd infection. 

      We thank the reviewer for this suggestion. We anticipate that the skin inflammation caused by Bd infection is not due solely to neutrophil infiltration and artificially altering the frog skin neutrophil content would thus not recapitulate chytridiomycosis progression. We completely agree that it would be valuable to examine barrier functions in control and mast cell-enriched, Bd-infected frogs. This is something that we hope to pursue further in future studies but feel that together with our additional findings, we are presenting a significant amount of data to constitute a stand-alone story.

      (6) In Figure S1A, analyzing only 3 AMP genes by qPCR is perhaps too focused. As a control, it would be useful to also test some genes known to be functionally important in neutrophil anti-microbial responses, e.g. lyz. Expanding on this experiment by performing RNA-seq on Bd-treated, bone-marrow-derived mast cells and neutrophils would be a great addition to the manuscript and an important resource for future studies in the field. The fact that the use of rSCF (or rCSF3) enables the differentiation of these cells in large numbers of pure populations presents this unique opportunity. Although IL-4 did not end up affecting mucus production, clues to the mediator(s) of this mast cell-dependent effect may be found with unbiased RNA-seq after exposure to Bd. 

      We thank the reviewer for this suggestion but would like to point out that our manuscript is focused on mast cells rather than neutrophils. We also believe that in vitro exposure of leukocytes to Bd is not the most physiologically relevant model of what would happen to skin-resident and incoming immune cell subsets, since Bd primarily infects top-most keratinocytes. We anticipate that rather than coming into direct contact with the fungus, cells like mast cells and neutrophils are responding to Bd-produced and infected cell-produced products. For this reason, we did not perform RNA-seq analysis of in vitro derived mast cells or neutrophils stimulated with Bd. As we develop more X. laevis-specific reagents, we hope to revisit the question of infected skin mast cell and neutrophil gene expression profiles but are not in a position to ask these questions at this time.

      This work is also guided by a finite budget, and we feel that together with our significant additional findings described in our revised manuscript, we are presenting a substantial amount of work to constitute a stand-alone story and manuscript.

      Reviewer #3 (Recommendations For The Authors): 

      The following are minor edits needed in the text and figure legends: 

      Standardize terms such as IL4 instead of il4 or ril4 vs rIL4 throughout. Also, r-SCF vs rSCF. 

      Thank you. Please find that we have standardized such terms throughout our revised manuscript. Please note that we are adhering to the convention that gene names are in lower case, protein names are in upper case and recombinant protein names are preceded by an ‘r’.

      Pg 9 Change "In contract" to "In contrast". 

      Thank you and changed accordingly.

      Fig 4 - Perhaps indicate if results in addition to 7dpi are also available. 

      Please find that we analyzed Bd loads in control and mast cell-enriched, infected frogs after 10 dpi. This data is presented in Figures 3 and 4 of our revised manuscript.

      Similarly in Fig. 5, are results other than 10dpi available in the supplement? 

      Please find that the results from the microbiome studies are presented in supplemental figure 3 (Fig. S3). Please note that the results presented in original manuscript Fig. 5A-C - revised manuscript Fig. 5B-E depict data for 21 dpi, which is the longest examined infection timepoint. We present data from 1 and 10 dpi in Fig. 4 of our revised manuscript.

      Indicate why these days were chosen in the methods. 

      Please find that we indicated why the experimental timepoints were chosen, in the methods section of our revised manuscript.

      Fig S1 legend has errors in describing which panels are for which asterisks. 

      Fig. S3 legend indicates panels F and G. 

      Thank you. Please find that we revised our supplemental figures and amended the corresponding figure legends.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study entitled "Rifampicin tolerance and growth fitness among isoniazid-resistant clinical Mycobacterium tuberculosis isolates: an in-vitro longitudinal study" by Vijay et al. provides valuable insights into the association of rifampicin tolerance and growth fitness with isoniazid resistance among clinical isolates of M. tuberculosis. Antibiotic tolerance in M. tuberculosis is an important topic since it contributes to the lengthy and complicated treatment required to cure tuberculosis disease and may portend the emergence of antibiotic resistance. The authors found that rifampicin tolerance was correlated with bacterial growth, rifampicin minimum inhibitory concentrations, and isoniazid-resistance mutations.

      Strengths:

      The large number of clinical isolates evaluated and their longitudinal nature during treatment for TB (including exposure to rifampin) are strengths of the study.

      Weaknesses:

      Some of the methodologies are not well explained or justified and the association of antibiotic tolerance with growth rate is not a novel finding. In addition, the molecular mechanisms underlying rifampicin tolerance only in rapidly growing isoniazid-resistant isolates have not been elucidated and the potential implications of these findings for clinical management are not immediately apparent.

      We thank the reviewer for the comments, we have modified the method section and figure 1 to clarify the method as suggested by the reviewer.

      Although we agree that previous studies have shown the association of slow growth rate with antibiotic tolerance, ours is the most comprehensive assessment of rifampicin tolerance among clinical isolates, to our knowledge. In particular, we show that the degree of tolerance in clinical isolates can vary over several orders of magnitude: which had not been previously documented or appreciated. Furthermore, the association of high tolerance among IR isolates is a new finding, and given the potential for tolerance to increase risk of de novo drug resistance, our study suggests that IR isolates with high rifampicin tolerance may present a risk for development of MDR-TB.

      In addition, we have also analysed the longitudinal isolates and the genetic variants emerging in them associated with increase in rifampicin tolerance. This analysis reveals possible multiple pathways to increase in rifampicin tolerance among clinical M. tuberculosis isolates. Possible clinical implication includes associating high rifampicin tolerance and isoniazid resistance as a risk factor for tuberculosis treatment failure. This study helps to develop further clinical studies to evaluate the role of rifampicin tolerance in IR isolates and treatment outcome. We have focused on these aspects in the discussion of the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study by Vijay and colleagues addresses a clinically important, and often overlooked aspect of Tb treatment. Detecting for variations in the level of antibiotic tolerance amongst otherwise antibiotic-susceptible isolates is difficult to routinely screen for, and consequently not performed. The authors, present a convincing argument that indeed, there is significant variation in the susceptibility of isoniazid-resistant strains to killing by rifampicin, in some cases at the same tolerance levels as bona fide resistant strains. On the whole, the study is easy to follow and the results are justified. This work should be of interest to the wider TB community at both a clinical and basic level.

      Weaknesses:

      The manuscript is long, repetitive in places, and the figures could use some amending to improve clarity (this could be a me-specific issue as they look ok on my screen, yet the colour is poor when printed).

      We thank the reviewer for the comments, we have modified the revised manuscript as per the reviewer suggestions.

      It would have been great to have seen some correlation between increased rifampicin tolerance and treatment outcome, although I'm not sure if this data is available to the researchers. I agree with the researchers the use of a single media condition is a limitation. However, this is true of a lot of studies. Rifampicin tolerance and treatment outcome analysis.

      We agree with the reviewer that correlation between rifampicin tolerance and treatment outcome is important. This needs to be performed in future studies with better design to correlate rifampicin tolerance with treatment progression or outcome data.  

      Reviewer #3 (Public Review):

      Summary:

      The authors have initiated studies to understand the molecular mechanisms underlying the devolvement of multi-drug resistance in clinical Mtb strains. They demonstrate the association of isoniazid-resistant isolates by rifampicin treatment supporting the idea that selection of MDR is a microenvironment phenomenon and involves a group of isolates.

      Strengths:

      The methods used in this study are robust and the results support the authors' claims to a major extent.

      Weaknesses:

      The manuscript needs a thorough vetting of the language. At present, the language makes it very difficult to comprehend the methodology and results.

      We thank the reviewer for the comments, we have revised the manuscript as per the reviewer’s suggestions.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Methods: The authors attempt to differentiate between "fast"- and "slow"-growing bacteria in order to determine if the growth rate is associated with rifampicin tolerance. This is accomplished by assessing growth on solid agar at 15 and 60 days post-incubation, respectively. However, mycobacterial growth rate is not a binary phenomenon but rather a continuous variable. Moreover, it is not clear why 15 and 60 days were selected. Also, instead of a "slow growth" phenotype, the 60-day time point might simply reflect a longer lag phase. Were the plates examined at any interval time points? It would be interesting to know whether colony growth was delayed overall in the populations observed only at 60 days, or simply if the appearance of microcolonies visible to the naked eye was delayed (with normal growth afterwards).

      We thank the reviewer for the comments, we want to clarify that we have not used agar plates but most-probable number method to determine the survival fraction post antibiotic treatment. We have clarified this in the revised manuscript and revised figure 1. The MPN method is a binary measure (growth/ no growth) and therefore cannot differentiate between long lag time and other mechanisms. In our original analysis, we included an intermediate time point of 30 days, but these data (included as supp fig. 1) cannot address the issue of lag phase directly. Since the 30-day time point did not add to the overall analysis and interpretation, we had not included them in the original submission.

      (2) Methods/Results/Discussion: Some important clinical information is missing-how were the patients treated who had IR isolates? Did they receive the standard regimen for DS TB or was another drug substituted for isoniazid? Exposure to different drugs could affect the rifampicin-tolerant populations during the intensive phase (Figure 5).

      Thank you for this comment, we have included the information regarding the treatment regimen in the revised manuscript.

      Were there differences in microbiological (sputum culture conversion rate at 8 weeks or time to culture negativity) or clinical outcomes based on isoniazid susceptibility? Perhaps more importantly, were there differences in microbiological/clinical outcomes based on the proportion of bacterial subpopulations with rifampicin tolerance for a particular isolate? There should be more discussion on the potential clinical implications of the study's findings.

      We agree with the reviewer that correlation between rifampicin tolerance and treatment progression or outcome is important. This needs to be performed in future studies with better design to correlate rifampicin tolerance with treatment progression or outcome data.  

      (3) Results (Figure 3A): Although an interesting finding, the increased rifampicin tolerance observed only in the "rapidly" growing populations of isoniazid-resistant isolates (IR) vs. isoniazid-susceptible (IS) isolates is not explained. In contrast, equally, increased rifampicin tolerance is seen in the "slowly" growing populations of both IR and IS isolates. It would be interesting to know if these slowly growing populations show specific tolerance to rifampicin or if, as expected, slow growth confers tolerance to a range of different bactericidal antibiotics.

      We thank the reviewer for the suggestions. we agree these will be interesting to investigate in a future study but are outside the scope of the current study.

      (4) Results (Figure 3B): The basis for the classification into tertiles is not clear and appears somewhat arbitrary-does this represent the survival of a particular isolate following rifampicin exposure relative to the other isolates based on isoniazid susceptibility (IS or IR) or the % growth relative to other populations for the same isolate? Figure 3B is missing a y-axis label. Is it a log10 MPN ratio?

      We thank the reviewer for pointing this, we want to clarify that for the classification into tertiles, first we pooled both group of isolates isoniazid susceptible (IS) and isoniazid resistant (IR) into a single population. Subsequently, we categorized this unified population into three distinct groups: low, medium, and high, based on their survival fraction following rifampicin treatment. Consequently, the 'low,' 'medium,' and 'high' tertiles represent the survival of each isolate following rifampicin exposure relative to the total number of isolates  combing both IS and IR isolates.

      For clarity, we provide a breakdown of the criteria for each tertile:

      +Low tertile: Consists of isolates with the lowest survival fraction (bottom 25%).

      +Medium tertile: Encompasses isolates with survival fractions that fall between the bottom 25% and the top 25%.

      +High tertile: Comprises isolates with the highest survival fractions (top 25%). This we have modified in the revised manuscript to clarify.

      We have also modified the Figure 3B to correct the y-axis label.

      (5) Results (lines 185-186): For correlating relative growth in the absence of antibiotics, 19 clinical isolates "outliers" were removed without explanation.

      We have added explanation for the “outliers” which were removed earlier due to deviation from normal distribution, we have also provided the supplementary figure 3 which includes these outliers.

      (6) Results (lines 203-211): The authors attempted to investigate a potential association between the mechanism of M. tuberculosis isoniazid resistance and the degree of rifampicin tolerance. However, the vast majority of IR clinical isolates (n=71) had a katG_S315X mutation and only 8 isolates had alternative mutations (inhA_I21T and fabG1_C-15X). Given the wide range of rifampicin tolerance observed within these isoniazid-resistant isolates, they concluded that other genetic or epigenetic determinants must be playing a role. WGS of longitudinally collected isolates from the same patients during TB treatment yielded non-synonymous SNPs in a list of genes previously reported to be associated with persistence, tolerance, and mycobacterial survival. However, precise mechanisms (including, e.g., expression of efflux pumps) are not investigated.

      We thank the reviewer for summarising the findings. Yes, we agree that investigating the precise mechanism of rifampicin tolerance is beyond the scope of the current work.

      Minor comments:

      (1) Abstract (line 41): The nonstandard abbreviations "IR" and "IS" have not been introduced prior to this usage.

      We have modified this in the abstract.

      (2) Introduction (line 60): Insert "phenomena" or "mechanisms" after "two".

      We have modified this in the introduction.

      (3) Introduction (lines 66-69): This sentence is confusing, especially the second part ("supporting this studies...").

      We have modified the lines to clarify.

      (4) Introduction (line 84): In the current text, it appears as if "IR" is the abbreviation for "isoniazid". Therefore, I recommend changing "resistance to isoniazid" to "isoniazid resistance".

      We have modified this in the revised manuscript.

      (5) Results (line 141): Insert "the" before "rest".

      We have modified this in the revised manuscript.

      (6) Results (line 187): Replace "did not had" with "did not have".

      We have modified this in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Abstract:

      The abstract is long and repetitive. It needs reworking and shortening to improve clarity and highlight the main takeaway message.

      We thanks the reviewer for the suggestions and have modified this in the revised manuscript.

      The introduction is interesting and contains relevant information. However, it is long and takes a while to get to the point of the study. It needs re-writing to emphasise key prior results and the purpose of this study.

      We thanks the reviewer for the suggestions and we have modified this in the revised manuscript.

      Results:

      As the study relies predominately on the use of MPN, I think a simple schematic of how the experiment is performed would be informative. Could this be added to Figure 1?

      We have revised the figure 1 in the manuscript to include the schematic representation.

      Some of the differences in MKD90, whilst they may be significant, are small so it would at least provide context as to the relevance of these differences. This may also alleviate my confusion as to how the authors can measure the time required to achieve MDK90 as 1.23-1.31 days when the first time point that is taken is day 2 (the data in Figure 2). They have FigS6 but this is small and hard to follow.

      We thank the reviewer for this suggestion, we have modified this in the revised manuscript and figureS6.

      Figure 2:

      Would be helpful to have -1 on the Y axis.

      The grey dots don't print very well (Might be my printer)

      We have modified this in the revised manuscript, figure 2.

      Line 142: The authors note a difference in RIF tolerance at day 15 that disappeared by day 60. I assume they are referring to the day 5 timepoint although this isn't clear as written.

      Yes, it is referring to the day 5 time point and we have clarified this in the revised manuscript.

      The section starting at line 148 (fig 3) is interesting, but it is difficult to read and follow what the difference is between this data and the prior data in Figure 2. It also wasn't until about line 165 that the purpose became clear. Overall the conclusions are sound and interesting.

      We have modified this in the revised manuscript.

      Line 154: What are the early and late time recovery time points?

      Is Figure 3A the same data as Figure 2?

      We have clarified this in the revised manuscript, the figure 3A is the same data as Figure 2.

      I found Figure 6 hard to follow. I'm not sure how better to present this data, but it should be improved. Some further clarification in the text would be helpful.

      We thank the reviewer for the suggestions. We have added more explanation in the text to clarify figure 6.

      Conclusions:

      The conclusions are sound, based on the data presented. The clinical relevance is highlighted, yet appropriately phrased to not be too far-reaching.

      Again, I think the conclusions could be condensed considerably. It is repetitive in places, which distills the main outcomes of this otherwise interesting and important study. The authors appropriately highlight some of the limitations of their study.

      We thank the reviewer for these comments and have modified this in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      The manuscript "Rifampicin tolerance and growth fitness among isoniazid-resistant clinical Mycobacterium tuberculosis isolates: an in-vitro longitudinal study" by Srinivasan et.al., details the identification/ development of isoniazid-resistant strains in clinical isolates following testament with rifampicin. This is an important aspect of understanding MDR development in TB strains. the results are promising and gel well with the hypothesis. However, the manuscript requires a thorough language modification. While the overall idea is clear the methodology does not come out clearly.

      Specific comments:

      (1) It is not clear whether rifampicin treatments were given for 2 and 5 days before kill curves or for 15 and 60 days? The methodology needs to be phased clearly. Why was this time interval of 15 days and 60 days taken? is there a rationale for this?

      We thank the reviewer for the suggestions, we have modified the method and figure 1 to clarify this in the revised manuscript.

      (2) A concentration of 2ug/ml was used for in vitro culture in this study. While the authors themselves indicate that this is well above the MIC, this might represent a non- natural dose and hence may force the evolution of strains. What will be the scenario in the natural course of antibiotic treatment (dose at MIC or less than MIC)?

      We have observed that till 5 days there is no significant resistant emergence but after 5 days only resistance emerges, therefore we avoided determining the survival fraction after resistance emergence, the kill curve represents mostly tolerant sub population. ADD: Pharmacokinetic studies of rifampicin dosing suggest that peak concentrations of >2-32 µg/mL are typical for standard doses of the drug, therefore we believe the chosen concentration of 2 µg/mL to be physiologically relevant.

      (3) As described in line 155, the survival spanned a broad distribution, across a million times in difference. This is rather surprising that 5 days of rifampicin treatment would lead to such a spread in resistance patterns. Did the authors study the different populations to understand this phenomenon? This is important given the scale of resistance developed in this short time.

      We want to clarify that the broad range of survival fraction reflect the difference in tolerant sub-populations but not resistant sub-population to rifampicin as they are determined post rifampicin treatment in rifampicin free media, this has been clarified in the revised figure 1.

      Overall, the manuscript is a detailed study with new insights into the development of multi-drug resistance by Mtb. A thorough vetting for language is essential for a greater impact of the study.

      We thank the reviewer and have attempted to improve the clarity of the language to increase the potential impact of our findings.

    1. Author response:

      The following is the authors' response to the current reviews.

      Reviewer #1 (Public Review):

      I'll begin by summarizing what I understand from the results presented, and where relevant how my understanding seems to differ from the authors' claims. I'll then make specific comments with respect to points raised in my previous review (below), using the same numbering. Because this is a revision I'll try to restrict comments here to the changes made, which provide some clarification, but leave many issues incompletely addressed.

      As I understand it the main new result here is that certain recurrent network architectures promote emergence of coordinated grid firing patterns in a model previously introduced by Kropff and Treves (Hippocampus, 2008). The previous work very nicely showed that single neurons that receive stable spatial input could 'learn' to generate grid representations by combining a plasticity rule with firing rate adaptation. The previous study also showed that when multiple neurons were synaptically connected their grid representations could develop a shared orientation, although with the recurrent connectivity previously used this substantially reduced the grid scores of many of the neurons. The advance here is to show that if the initial recurrent connectivity is consistent with that of a line attractor then the network does a much better job of establishing grid firing patterns with shared orientation.

      Beyond this point, things become potentially confusing. As I understand it now, the important influence of the recurrent dynamics is in establishing the shared orientation and not in its online generation. This is clear from Figure S3, but not from an initial read of the abstract or main text. This result is consistent with Kropff and Treves' initial suggestion that 'a strong collateral connection... from neuron A to neuron B... favors the two neurons to have close-by fields... Summing all possible contributions would result in a field for neuron B that is a ring around the field of neuron A.' This should be the case for the recurrent connections now considered, but the evidence provided doesn't convincingly show that attractor dynamics of the circuit are a necessary condition for this to arise. My general suggestion for the authors is to remove these kind of claims and to keep their interpretations more closely aligned with what the results show.

      We would like to clarify that the simple (flexible) attractor is a weaker condition than the ones previously used to align grid cells. However, by no means we claim that it is a necessary condition for grid maps to align. Other architectures, certainly more complex ones but perhaps even simpler ones, can align grid maps in our model.

      Major (numbered according to previous review)

      (1) Does the network maintain attractor dynamics after training? Results now show that 'in a trained network without feedforward Hebbian learning the removal of recurrent collaterals results in a slight increase in gridness and spacing'. This clearly implies that the recurrent collaterals are not required for online generation of the grid patterns. This point needs to be abundantly clear in the abstract and main text so the reader can appreciate that the recurrent dynamics are important specifically during learning.

      We respectfully disagree with the interpretation of this result. In this model cells self-organize to produce aligned grid maps. In such systems it makes sense to characterize the equilibrium states of the system. We turned learning off in Figure S3 to show that the recurrent connections have a contractive effect on grid spacing. But artificially turning off learning means that one can no longer make claims about the equilibrium states of the system, since it can no longer evolve freely. In a functional network, if the recurrent attractor is removed, the system will evolve towards poor gridness and no alignment no matter what the starting point is, as also shown in Figure S3. Several experimental results invite us to think of grid cells as the equilibrium solution of a series of constraints that is ready to change at any time: Barry et al, 2012; Yoon et al, 2013; Carpenter et al, 2015; Krupic et al, 2015; Krupic et al, 2018; Jayakumar et al, 2019.

      One point in which we perhaps agree with the reviewer is that information about the hexagonal maps is kept in the feedforward weights, while behavior and the recurrent collaterals act as constraints of which these feedforward weights are the equilibrium solution.

      (2) Additional controls for Figure 2 to test that it is connectivity rather than attractor dynamics (e.g. drawing weights from Gaussian or exponential distributions). The authors provide one additional control based on shuffling weights. However, this is far from exhaustive and it seems difficult on this basis to conclude that it is specifically the attractor dynamics that drive the emergence of coordinated grid firing.

      Again, we do not claim that this is the only way in which grid maps can be aligned, but it is the simplest one proposed so far. We were asked if it was the specific combination of input weights to a cell rather than the organization provided by the attractor which resulted in aligned maps. By shuffling the inputs to a cell we keep the combination of inputs invariant but lose the attractor architecture. Since grid maps in this new situation are not aligned, we can safely conclude that it is not the combination of inputs per se, but the specific organization of these inputs that allows grid alignment. It is not fully clear to us what ‘exhaustive’ means in this context.

      (3) What happens if recurrent connections are turned off? The new data clearly show that the recurrent connections are not required for online grid firing, but this is not clear from the abstract and is hard to appreciate from the main text.

      This point is related to (1). Absent this constraint, Figure S3 shows that the system evolves toward larger spacing, with poorer gridness and no alignment.

      (4) This is addressed, although the legend to Fig. S2D could provide an explanation / definition for the y-axis values.

      We have now added: Mean input fields are the sum of all inputs of a given kind entering a neuron at a given moment in time, averaged across cells and time.

      (5) Given the 2D structure of the network input it perhaps isn't surprising that the network generates 2D representations and this may have little to do with its 1D connectivity. The finding that the networks maintain coordinated grids when recurrent connections are switched off supports my initial concern and the authors explanation, to me at least, remain confusing. I think it would be helpful to consider that the connectivity is specifically important for establishing the coordinated grid firing, but that the online network does not require attractor dynamics to generate coordinated grid firing.

      This point is related to (1) and (3). We agree with the reviewer that the input lies within a 2D manifold, but this is not something that the network has to find out because it receives one datapoint of information at a time. This alone is not enough to form aligned grid cells, since each grid cell can find a roughly equivalent equilibrium in a different direction. It is only the constraint imposed by the recurrent collaterals that aligns grid maps, and, as we show, this constraint does not need to be constructed ad hoc to work on 2D, as previously thought. When recurrent connections are switched off, the system evolves toward unaligned grid maps, with larger spacing and lower gridness. Regarding the results obtained after modifying the network and turning off learning, we think they have a very limited scope (in this case showing the contractive effect of recurrent collaterals on grid spacing), given that the system is artificially being kept out of its natural equilibrium.

      (6) Clarity of the introduction. This is somewhat clearer, but I wonder if it would be hard for someone not familiar with the literature to accurately appreciate the key points.

      We have made our best effort to improve the clarity of the introduction.

      (7) Remapping. I'm not sure why this is ill posed. It seems the proposed model can not account for remapping results (e.g. Fyhn et al. 2007). Perhaps the authors could just clearly state this as a limitation of the model (or show that it can do this).

      We view our model as perfectly consistent with Fyhn et al, 2007. Remapping is not triggered by the network itself, though, but rather by a re-arrangement of the inputs requiring the network to learn new associations. Different simulations of the same model with identical parameters can be interpreted as remapping experiments.

      Reviewer #3 (Public Review):

      Summary:

      The paper proposes an alternative to the attractor hypothesis, as an explanation for the fact that grid cell population activity patterns (within a module) span a toroidal manifold. The proposal is based on a class of models that were extensively studied in the past, in which grid cells are driven by synaptic inputs from place cells in the hippocampus. The synapses are updated according to a Hebbian plasticity rule. Combined with an adaptation mechanism, this leads to patterning of the inputs from place cells to grid cells such that the spatial activity patterns are organized as an array of localized firing fields with hexagonal order. I refer to these models below as feedforward models.

      It has already been shown by Si, Kropff, and Treves in 2012 that recurrent connections between grid cells can lead to alignment of their spatial response patterns. This idea was revisited by Urdapilleta, Si, and Treves in 2017. Thus, it should already be clear that in such models, the population activity pattern spans a manifold with toroidal topology. The main new contributions in the present paper are (i) in considering a form of recurrent connectivity that was not directly addressed before. (ii) in applying topological analysis to simulations of the model. (iii) in interpreting the results as a potential explanation for the observations of Gardner et al.

      We wanted to note that we do not see this paper as proposing an alternative to the attractor hypothesis, given that we use attractor networks, but rather as an exploration of possibilities not yet visited by this hypothesis.

      Strengths:

      The exploration of learning in a feedforward model, when recurrent connectivity in the grid cell layer is structured in a ring topology, is interesting. The insight that this not only align the grid cells in a common direction but also creates a correspondence between their intrinsic coordinate (in terms of the ring-like recurrent connectivity) and their tuning on the torus is interesting as well, and the paper as a whole may influence future theoretical thinking on the mechanisms giving rise to the properties of grid cells.

      Weaknesses:

      (1) In Si, Kropff and Treves (2012) recurrent connectivity was dependent on the head direction tuning, in addition to the location on a 2d plane, and therefore involved a ring structure. Urdapilleta, Si, and Treves considered connectivity that depends on the distance on a 2d plane. The novelty here is that the initial connectivity is structured uniquely according to latent coordinates residing on a ring.

      The recurrent architectures in the cited works are complex and require arranging cells in a 2D manifold to calculate connectivity based on their relative 2D position. In other words, the 2D structure is imprinted in the architecture, as in our 2D condition. In this work the network is much simpler and only requires neighboring relations in 1D. Such relationships have been shown to spontaneously emerge in the hippocampal formation (Pastalkova et al, 2008; Gonzalo Cogno et al, 2024).

      (2) The paper refers to the initial connectivity within the grid cell layer as one that produces an attractor. However, it is not shown that this connectivity, on its own, indeed sustains persistent attractor states. Furthermore, it is not clear whether this is even necessary to obtain the results of the model. It seems possible that (possibly weaker) connections with ring topology, that do not produce attractor dynamics but induce correlations between neurons with similar locations on the ring would be sufficient to align the spatial response patterns during the learning of feedforward weights.

      Regarding the first part of the comment, the recurrent collaterals create one or at times multiple bumps of activity in the network so that neighboring (interconnected) cells activate together. An initial random state of activity rapidly falls into this dynamic, constrained by the attractor. To us this is not surprising given that this connectivity is the classical means of creating a continuous attractor. Perhaps there is some deeper meaning in this comment that we are not fully grasping.

      Regarding the second part of the comment, we fully agree with the reviewer. We are presenting what so far is the simplest connectivity that can align grid maps, but by no means we claim that it is the simplest possible one. Regarding weaker connections with ring topology, we show in Figure S2 that a ring attractor with too weak or too strong connections is incapable of aligning grids, since a balance between feedforward and feedback inputs is required.

      (3) Given that all the grid cells are driven by an input from place cells that span a 2d manifold, and that the activity in the grid cell network settles on a steady state which is uniquely determined by the inputs, it is expected that the manifold of activity states in the grid cell layer, corresponding to inputs that locally span a 2d surface, would also locally span a 2d plane. The result is not surprising. My understanding is that this result is derived as a prerequisite for the topological analysis, and it is therefore quite technical.

      We understand that the reviewer is referring to the motivation behind studying local dimensionality. We agree that the topological analysis approach is quite technical, but it provides unique insights. The theorem of closed surfaces, which allows us to deduce a toroidal topology from Betti numbers (1,2,1), only applies to closed surfaces. One thus needs to show that the point cloud is a surface (local dimensionality of 2) and is closed (no borders or singularities). If borders or singularities were present, a toroidal topology could not be claimed from these Betti numbers. Thus, it is a crucial step of the analysis.

      (4) The modeling is all done in planar 2d environments, where the feedforward learning mechanism promotes the emergence of a hexagonal pattern in the single neuron tuning curve. Under the scenario in which grid cell responses are aligned (i.e. all neurons develop spatial patterns with the same spacing and orientation) it is already quite clear, even without any topological analysis that the emerging topology of the population activity is a torus.

      However, the toroidal topology of grid cells in reality has been observed by Gardner et al also in the wagon wheel environment, in sleep, and close to boundaries (whereas here the analysis is restricted to the a sub-region of the environment, far away from the walls). There is substantial evidence based on pairwise correlations that it persists also in various other situations, in which the spatial response pattern is not a hexagonal firing pattern. It is not clear that the mechanism proposed in the present paper would generate toroidal topology of the population activity in more complex environments. In fact, it seems likely that it will not do so, and this is not explored in the manuscript.

      We agree that our work was constrained to exploration in 2D and that the situations posed by the reviewer are challenging, but we do not see them as unsurmountable. The wagon wheel shows a preservation of toroidal topology locally, where the behavior of the animal is rather 2-dimensional. Globally, hexagonal maps are lost, which is compatible with some flexibility in the way grid maps are formed. If sleep meant that all inputs are turned off, our model would predict a dynamic dictated by the architecture (1D for the ring attractor, for example), but we do not really know that this is the case. In the future, we intend to explore predictive activity along the linear attractor, which could both result in path integration and in some level of preservation of the activity when inputs are completely turned off.

      Regarding boundaries, as we have argued before, the cited work chooses to filter away what looks like more than half of the overall explained variance through PCA, and this is only before applying a non-linear dimensionality reduction algorithm. It is specifically shown that the analyzed components are the ones with global periodicity throughout the environment. Thus, it is conceivable that through this approach, local irregularities found only at the borders are disregarded in favor of a clearer global picture. While using a different methodology, our approach follows a similar spirit, albeit with far less noisy data.

      (5) Moreover, the recent work of Gardner et al. demonstrated much more than the preservation of the topology in the different environments and in sleep: the toroidal tuning curves of individual neurons remained the same in different environments. Previous works, that analyzed pairwise correlations under hippocampal inactivation and various other manipulations, also pointed towards the same conclusion. Thus, the same population activity patterns are expressed in many different conditions. In the present model, this preservation across environments is not expected. Moreover, the results of Figure 6 suggest that even across distinct rectangular environments, toroidal tuning curves will not be preserved, because there are multiple possible arrangements of the phases on the torus which emerge in different simulations.

      We agree with this observation. A symmetry in our implementation results in the fact that only ~50% of times the system falls in the preferred solution, and the rest of the times it falls into other local minima. Whether this result is at odds with current observations can be debated on the basis of probabilities. However, we believe that the symmetry we found is purely circumstantial, and that it can be broken by elements such as head direction modulation or other ingredients used to achieve path integration. In other words, we acknowledge that symmetry is an issue of the implementation we show here (which has been kept as simple as possible to serve as a proof-of-principle) but we do not think that it is a defining feature of flexible attractors in general. We expect that future implementations that incorporate path integration capabilities will not present this kind of symmetry in the space of solutions.

      Regarding the rigid phase translation across modalities, while this effect is very clear in Gardner et al, it is less so in other datasets. The analyses shown in Hermansen et al (2024) can rather be interpreted as somewhere in the way between perfect rigid translation and fully randomized phases across navigation modalities.

      (6) In real grid cells, there is a dense and fairly uniform representation of all phases (see the toroidal tuning of grid cells measured by Gardner et al). Thus, the highly clustered phases obtained in the model (Fig. S1) seem incompatible with the experimental reality. I suspect that this may be related to the difficulty in identifying the topology of a torus in persistent homology analysis based on the transpose of the matrix M.

      We partly agree with this observation and note that a pattern of ordered phases is an issue not only for the 1D attractor but also for the 2D one, which appears much more uniform than in experimental data. The low number of neurons we used for computational economy and the full connectivity could be key ingredients to generate these phase patterns. To show that this is not a defining feature of flexible attractors, apart from the fact that these patterns appear also with non-flexible 2D architectures, we included in Figure S1 simulations with ‘fragmented 1D’ architectures. In this case the architecture is a superposition of 20 random 1D stripe-like attractors. While the alignment of maps achieved with this architecture is almost at the same level as the one obtained with 1D and 2D attractors, the phases are much more similar to what has been observed experimentally, and less uniform than what is obtained with 2D attractors.

      (7) The motivations stated in the introduction came across to me as weak. As now acknolwledged in the manuscript, attractor models can be fully compatible with distortions of the hexagonal spatial response patterns - they become incompatible with this spatial distortions only if one adopts a highly naive and implausible hypothesis that the attractor state is updated only by path integration. While attractor models are compatible with distortions of the spatial response pattern, it is very difficult to explain why the population activity patterns are tightly preserved across multiple conditions without a rigid two-dimentional attractor structure. This strong prediction of attractor models withstood many experimental tests - in fact, I am not aware of any data set where substantial distortions of the toroidal activity manifold were observed, despite many attempts to challenge the model. This is the main motivation for attractor models. The present model does not explain these features, yet it also does not directly offer an explanation for distortions in the spatial response pattern.

      Some interesting examples are experiments in 3D, where grid cells presumably communicate with each other through the same recurrent collaterals, but global periodicity is lost and only some local order is preserved even away from boundaries (Ginosar et al, 2021; Grieves et al, 2021). While these datasets have not been explored using topological analysis, they serve as strong motivators to understanding 2D grid cells as one equilibrium solution that arises under some set of constraints, but belongs to a wider space of possible solutions that may arise as well under more flexible constraints. Even (and especially) if one adheres to the hypothesis that grid cells are pre-wired into a 2D torus, a concept like flexible attractors might become useful to understand how their activity is rendered in 3D. Another strong motivation is our lack of understanding of how a perfectly balanced 2D structure is formed and maintained. Simpler architectures could be thought of as alternatives, but also as an intermediate step towards it.

      Regarding the rigid phase translation across modalities, while this effect is very clear in Gardner et al, it is less so in other datasets. The analyses shown in Hermansen et al (2024) can rather be interpreted as somewhere in the way between perfect rigid translation and fully randomized phases.

      In a separate point, although it might not be strictly related to the comment, we do not fully share the idea that persistent activity patterns during sleep are necessary or sufficient conditions for attractor dynamics, although we do agree that attractors could be the mechanism behind them and any alternative is at least as complex as attractors. On the necessity side, attractors in the hippocampus are not constantly engaged (Wills et al, 2005). For sufficiency, one should prove that no other network is capable of reproducing the phenomenon, and to our best knowledge we are still far from that point.

      (8) There is also some weakness in the mathematical description of the dynamics. Mathematical equations are formulated in discrete time steps, without a clear interpretation in terms of biophysically relevant time scales. It appears that there are no terms in the dynamics associated with an intrinsic time scale of the neurons or the synapses (a leak time constant and/or synaptic time constants). I generally favor simple models without lots of complexity, yet within this style of modelling, the formulation adopted in this manuscript is unconventional, introducing a difficulty in interpreting synaptic weights as being weak or strong, and a difficulty in interpreting the model in the context of other studies.

      We chose to keep the model as simple as possible and in the line of previous publications developing it. However, we see the usefulness of putting it in what in the meantime has become a canonical framework. Fortunately this has been done by D’Albis and Kempter (2017). In our simplified version of the model there is no leak term and adaptation on its own brings down activity in the absence of input, but we agree that such a term could be added, albeit not without modifying all other network parameters.

      In my view, the weaknesses discussed above limit the ability of the model, as it stands, to offer a compelling explanation for the toroidal topology of grid cell population activity patterns, and especially the rigidity of the manifold across environments and behavioral states. Still, the work offers an interesting way of thinking on how the toroidal topology might emerge.

      Reviewer 1:

      Reviewer #1 (Recommendations For The Authors):

      See comments above. In addition:

      (1) Abstract: '...interconnected by a two-dimensional attractor guided by path integration'. This is unclear. I think the intended meaning might be along the lines of '...their being computed by a 2D continous attractor that performs path integration'?

      'path integration allowing for no deviations from the hexagonal pattern' This is incorrect. Local modulation of the gain of the speed input to a standard CAN would distort the grid pattern.

      'Using topological data analysis, we show that the resulting population activity is a sample of a torus' Activity in the model?

      'More generally, our results represent a proof of principle against the intuition that the architecture and the representation manifold of an attractor are topological objects of the same dimensionality, with implications to the study of attractor networks across the brain' I guess one might hold this intuition, but it strikes me as obvious that if you impose an sufficiently strong n-dimensional input on a network then it it's activity could have the same dimensionality. I don't really see this as being a point worth highlighting. Perhaps the more interesting point, it that during learning the recurrent connectivity aligns the grid fields of neurons in the network, and this may be a specific function of the 1D attractor dynamcis, although I don't think the authors have made this point convincing.

      'The flexibility of this low dimensional attractor allows it to negotiate the geometry of the representation manifold with the feedforward inputs'. See above for comments on the use of 'negotiate'.

      'while the ensemble of maps preserves features of the network architecture'. I don't understand this. What is the 'ensemble of maps' and what are the features referred to.

      We have reviewed the abstract considering these points. Regarding the ‘strong n-dimensional input’, we want to point out that it is not the input itself that generates a torus (the no attractor condition does not lead to a torus) but rather the interplay between the input and the attractor.

      ‘Perhaps the more interesting point …’, we do not fully understand how this sentence deviates from our own conclusions. We here show that a strong n-dimensional input is not enough to align grid cells (produce a n-torus), it is the interplay between inputs and attractor dynamics that does so, even if the attractor is not n-dimensional in terms of architecture.

      The ensemble of maps refers to the transpose of the population activity matrix, where each point in the cloud is a map, and the features refer to the persistent homology.

      (2) The manuscript still fails to clarify the difference between a model that path integrates in two dimensions and a model that simply represents information with a given dimensionality. The argument that it's surprising that a network with 1D architecture represents a higher dimensional input strikes me as incorrect and an unnecessary attempt to argue for conceptual importance. At least to me this isn't surprising. It would be surprising if the 1D network could path integrate but this doesn't seem to be the case.

      In response to the reviewer’s concerns, we have made clear in the introduction and discussion that this model has no path integration capabilities, although we aim to develop a model capable of path integration using the kind of simple architecture presented here. We want to highlight here that equating attractor dynamics with path integration would be a conceptual mistake.

      (3) Other wording also seems to make unnecessary conceptual claims. E.g. The repeated use of 'negotiate' implies some degree of intelligence, or at least an exchange of information, that isn't shown to exist. I wonder if more precise language could be used? As I understand it the dimensionality is bounded by the inputs on the one hand, and the network connectivity on the other, with the actual dimensionality being a function of the recurrent and feedforward synaptic weights. There's clearly some role for the relative weights and the properties of plasticity rules, but I don't see any evidence for a negotiation.

      An interesting observation in Figure S2 is that grid maps are aligned only if the relative strength of feedforward and recurrent inputs is similar. If one of them can impose over the other, grid maps do not align. This equilibrium can metaphorically be thought of as a negotiation instance, where the negotiation is an emergent property of the system rather than something happening at an individual synapse.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Reviewer #1 (Recommendations For The Authors):

      Major

      (1) What is the evidence that, after training, the 1D network maintains its attractor dynamics when feedforward inputs are active? If the claim is that it does then it's important to provide evidence, e.g. responses to perturbations, or other tests. The alternative is that after training the recurrent inputs are drowned out by the feed forward spatial inputs.

      We agree with the reviewer on the importance of this point. In our model, networks are always learning, and the population activity represented by aligned grid maps in a trained network is a dynamic equilibrium that emerges from the interplay between feedforward and collateral constraints. If Hebbian learning is turned off, one gets a snapshot of the network at that moment. We now show in Fig. S3 that in a trained network without feedforward Hebbian learning the removal of recurrent collaterals results in a slight increase in gridness and spacing. The expansion is due to the fact that, as we argue in the Results section, the attractor has a contractive effect on grid maps, which could relate to observations in novel environments (Barry et al, 2007). If Hebbian learning is turned on in the same situation, the maps, no longer constrained by the attractor, drift toward the equilibrium solution of the ‘No attractor’ condition, with significantly larger spacing, no alignment and lower individual gridness. Thus, the attractor is the force preventing them to do so when feedforward Hebbian learning is on.

      These observations point to the key role played by the attractor not only in forming but also in sustaining grid activity. The dynamic equilibrium framework fits well known properties of the system, such as its capacity to recalibrate very fast (Jayakumar et al, 2019), although this particular feature cannot be modeled with the current version of our model, that lacks path integration capabilities.

      (2) It would be useful to include additional control conditions for Figure 2 to test the hypothesis that it is simply connectivity, rather than attractor dynamics, that drives alignment.

      This could be achieved by randomly assigning strengths to the recurrent connections, e.g. drawing from exponential or Gaussian distributions.

      We agree and have included Fig. S2b-d, showing that the same distribution of collateral input weights entering each neuron, but lacking the 1D structure provided by the attractor, does not align grid maps. This is achieved by shuffling rows in the connectivity matrix, while avoiding self connections to make the comparison fair (self connections substantially alter the dynamic of the network, making it much more rigid). We observed that individual grid maps have very low gridness levels, even lower than in the no-attractor condition. In contrast, they have levels of population gridness slightly higher than in the no-attractor condition, but closer to 0 than to levels achieved with attractors. Our interpretation of these results is that irregular connectivity achieves some alignment in a few arbitrary directions and/or locations, which improves the coordination between maps at the expense of impairing rather than improving hexagonal responses of individual cells. Such observations stand in clear context to what is observed with continuous attractors with an orderly architecture.

      These results suggest that it is the structure of the attractor that allows grid cells to be aligned rather than the mere presence of recurrent collateral connections.

      (3) It seems conceivable that once trained the recurrent connections would no longer be required for alignment. Can this be evaluated by considering what happens if the recurrent connections are turned off after training (or slowly turned off during training)? Does the network continue to generate aligned grid fields?

      This point has elements in common with point 1. As we argued in that response, the attractor has two main effects on grid maps: it aligns them and it contracts them. If the attractor is turned off, feedforward Hebbian learning progressively drives maps toward the solution obtained for the ‘no attractor’ condition, characterized by maps with larger spacing, poorer gridness and lack of alignment.

      (4) After training what is the relative strength of the recurrent and feedforward inputs to each neuron?

      Both recurrent and feedforward synaptic-strength matrices are normalized throughout training, so that the overall incoming synaptic strength to each neuron is invariant. Because of this, although individual feed-forward and recurrent input fields vary dynamically, their average is constant, with the exception of the very first instances of the simulation, before a stable regime is reached in grid-cell activity levels. We have included Fig. S2d, showing the dynamics of feedforward and recurrent mean fields throughout learning as well as their ratio. In addition, Fig. S2a shows that the strength of recurrent relative to feedforward inputs is an important parameter, since alignment is only obtained in an intermediate range of ratios.

      (5) It would be helpful to also evaluate the low dimensional structure of the input to the network. Assuming it has a 2D structure, as it represents 2D space, can an explanation be provided for why it is surprising that the trained network also encodes activity with a 2D manifold? It strikes me that the more interesting finding might relate to alignment of the grids rather than claims about a 1D attractor encoding a 2D representation. Either way, stronger evidence and clearer discussion would be helpful.

      The reviewer is correct in assuming that the input has a 2D structure, that can be represented by a sheet embedded in a high dimensional space and thus has the Betti numbers [1,0,0]. The surprising element in our results is that we are showing for the first time that the population activity of an attractor network is constrained to a manifold that results from the negotiation between the architecture of the attractor and the inputs, and does not merely reflect the former as previously assumed. In this sense, the alignment of grid cells by a 1D attractor is an instance of the more general case that 1D attractors can encode 2D representations.

      It is certainly the case that the 2D input is a strong constraint pushing population activity toward a 2D manifold. However, the final form of the 2D manifold is strongly constrained by the attractor, as shown by the contrast with the no-attractor condition (a 2D sheet, as in the input, vs a torus when the attractor is present). The 1D attractor is able to flexibly adapt to the constraint posed by the inputs while doing its job (as demonstrated in previous points), which results in 2D grid maps aligned by a 1D attractor. Generally speaking, this work provides a proof of principle demonstrating that the topology of the attractor architecture and the manifold of the population activity space need not be identical, as previously widely assumed by the attractor community, and need not even have the same dimensionality. Instead, a single architecture can potentially be applied to many purposes. Hence, our work provides a valuable new perspective that applies to the study of attractors throughout the brain.

      (6) The introduction should be clearer about the different types of grid model and the computations they implement. E.g. The authors' previous model generates grid fields from spatial inputs, but if my understanding is correct it isn't able to path integrate. By contrast, while the many 2D models with continuous attractor dynamics also generate grid representations, they do so by path integration mechanisms that are computationally distinct from the spatial transformation implemented by feedforward models (see also general comments above).

      We agree with the reviewer and have made this point explicit in the introduction.

      (7) A prediction from continuous attractor models is that when place cells remap the low dimensional manifold of the grid activity is unaffected, except that the location of the activity bump is moved. It strikes me as important to test whether this is the case for the model presented here (my intuition is that it won't be, but it would be important to establish either way).

      We want to emphasize that our model is a continuous attractor model, so the question regarding the difference between what our model and continuous attractor network models predict is an ill-posed one. One of our main conclusions is precisely that attractors can work in a wider spectrum of ways than previously thought.

      In lack of a better definition, our multiple simulations could be thought of as training in different arenas. It is true that in our model maps take time to form, but this is also the case in novel environments (Barry et al, 2007 ), and continuous attractor models exclusively or strongly guided by self motion cues struggle to replicate this phenomenon. We show that the current version of our model accepts multiple solutions (in practice four but conceptually infinite countable), all of them resulting in a torus for the population activity (i.e. the same topology or low dimensional manifold). It is not clear to us how easy it would be to differentiate between most of these solutions in experimental data, with only incomplete information. This said, incorporating a symmetry-breaking ingredient to the model, for example related to head direction modulation, could perhaps lead to the prevalence of a single type of solution. We intend to explore this possibility in the future in order to add path-integration capabilities to the system, as described in the discussion.

      (8) The Discussion implies that 1D networks could perform path integration in a manner similar to 2D networks. This is a strong claim but isn't supported by evidence in the study. I suggest either providing evidence that this is the case for models of this kind or replacing it with a more careful discussion of the issue.

      The current version of our model has no path integration capabilities, as is now made explicit in the Introduction and Discussion. In addition, we have now made clear that the idea that path integration could perhaps be implemented using 1D networks is, although reasonable, purely speculative.

      Minor

      (1) Introduction. 'direct excitatory communication between them'. Suggest rewording to 'local synaptic interactions', as communication can also be purely inhibitory (e.g. Burak and Fiete, 2009) or indirect by excitation of local interneurons (e.g. Pastoll et al., Neuron, 2013).

      We agree and have adopted this phrasing.

      (2) The decision to focus the topology analysis on the 60 cm wide central square appears somewhat arbitrary. Are the irregularities referred to a property of the trained networks or would they also emerge with analysis of simulated ideal data? Can more justification be expanded and supplementary analyses be shown when the whole arena is used?

      In practical terms, a subsampling of the data to around half was needed because the persistent homology packages struggle to handle large amounts of data, especially in the calculation of H2. We decided to cut a portion of contiguous pixels in the open field at least larger than the hexagonal tile representing the whole grid population period (as represented in Figure 6). Leaving the borders aside was a logical choice since it is known that the solution at the borders is particularly influenced by the speed anisotropy of the virtual rat (see Si, Kropff & Treves, 2012), in a way that mimics how borders locally influence grid maps in actual rats (Krupic et al, 2015). The specific way in which our virtual rat handles borders is arbitrary and might not generalize. A second issue around borders is that maps are differently affected by incomplete smoothing, although this issue does not apply to our data because we did not smooth across neighboring pixels. In sum, considering the central 60 cm wide square was sufficient to contain the whole torus and a reasonable compromise that would allow us to perform all analyses in the part of the environment less influenced by boundaries.

      (3) It could help the general reader to briefly explain what a persistence diagram is.

      This is developed in the Appendix, but we have now added a reference to it and a brief description in the main text.

      (4) For the analyses in Figure 3-4, and separately for Figure 5, it might help the reader to provide visualizations of the low dimensional point cloud.

      All these calculations take place in the original high-dimensional point cloud. Doing them in a reduced space would be incorrect because there is no dimensionality reduction technique that guarantees the preservation of topology. In Figure 7 we reduce the dimensionality of data but emphasize that it is only done for visualization purposes, not to characterize topology. We also point out in this Figure that the same non-linear dimensionality reduction technique applied to objects with identical topology yields a wide variety of visualizations, some of them clear and some less clear. This observation further exemplifies why one cannot assume that a dimensionality-reduction technique preserves topology, even for a low-dimensional object embedded in a high-dimensional space.

      (5) The detailed comparison of the dynamics of each model is limited by the number of data points. Why not address this by new simulations with more neurons?

      We are not sure we understand this comment. In Figure 2, the dynamics for each model are markedly different. These are averages over 100 simulations. We are not sure what benefit would be obtained from adding more neurons. Before starting this work we searched for the minimal number of neurons that would result in convergence to an aligned solution in 2D networks, which we found to be around 100. Optimizing this parameter in advance was important to reduce computational costs throughout our work.

      (6) Could the variability in Figure 7 also be addressed by increasing the number of data points?

      As we argued in a previous point, there is no reason to expect preservation of topology after applying Isomap. We believe this lack of topology preservation to be the main driver of variability.

      (7) Page/line numbers would be useful.

      We agree. However, the text is curated by biorxiv which, to our best knowledge, does not include them.

      Reviewer 2:

      Reviewer #2 (Recommendations For The Authors):

      (1) I highly suggest that the author rewrite some parts of the Results. There are lots of details which should be put into the Methods part, for example, the implementation details of the network, the analysis details of the toroidal topology, etc. It will be better to focus on the results part first in each section, and then introduce some of the key details of achieving these results, to improve the readability of the work.

      This suggestion contrasts with that of Reviewer #1. As a compromise, we decided to include in the Results section only methodological details that are key to understanding the conclusions, and describe everything else in the Methods section.

      (2) 'Progressive increase in gridness and decrease in spacing across days have been observed in animals familiarizing with a novel environment...' From Fig.2c I didn't see much decrease. The authors may need to carry out some statistical test to prove this. Moreover, even the changes are significant, this might be not the consequence of the excitatory collateral constraint. To prove this, the authors may need to offer some direct evidence.

      We agree that the decrease is not evident in this figure due to the scale, so we are adding the correlation in the figure caption as proof. In addition, several arguments, some related to new analyses, demonstrate that the attractor contracts grid maps. First, the ‘no attractor’ condition has a markedly larger spacing compared to all other conditions (Fig. 2a). We also now show that spacing monotonically decreases with the strength of recurrent relative to feedforward weights, in a way that is rather independent of gridness (Fig. S2a). Second, as we now show in Fig. S2b-d, simulations with a shuffled 1D attractor, such that the sum of input synapses to each neuron are the same as in the 1D condition but no structure is present, lead to a spacing that is mid-way between the ‘no attractor’ condition and the conditions with attractors. Third, as we now show in Fig. S3a, turning off both recurrent connections and feedforward learning in a trained network results in a small increase in spacing. Fourth, as we now show in Fig. S3b, turning off recurrent connections while feedforward learning is kept on increases grid spacing to levels comparable to those of the ‘no attractor’ condition. All these elements support a role of the attractor in contracting grid spacing.

      (3) Some of the items need to be introduced first before going into details in the paper, for instance, the stipe-like attractor network, the Betti number, etc.

      We have added in the Results section a brief description and references to full developments in the Appendix.

      Reviewer 3 (Public Review):

      (1) It is not clear to me that the proposal here is fundamentally new. In Si, Kropff and Treves (2012) recurrent connectivity was dependent on the head direction tuning and thus had a ring structure. Urdapilleta, Si, and Treves considered connectivity that depends on the distance on a 2d plane.

      In the work of Si et al connectivity is constructed ad-hoc for conjunctive cells to represent a torus, it depends on head-directionality but also on the distance in a 2D plane. The topology of this architecture has not been assessed, but it is close to the typical 2D ‘rigid’ constraint. In the work of Urdapilleta et al, the network is a simple 2D one. The difference with our work is that we focus on the topology of the recurrent network and do not use head-direction modulation. In this context, we prove that a 1D network is enough to align grid cells and, more generally, we provide a proof of principle that the topology of the architecture and the representation space of an attractor network do not need to be identical, as previously assumed by the attractor community. These two important points were neither argued, speculated nor self-evident from the cited works.

      (2) The paper refers to the connectivity within the grid cell layer as an attractor. However, would this connectivity, on its own, indeed sustain persistent attractor states? This is not examined in the paper. Furthermore, is this even necessary to obtain the results in the model? Perhaps weak connections that do not produce an attractor would be sufficient to align the spatial response patterns during the learning of feedforward weights, and reproduce the results? In general, there is no exploration of how the strength of collateral interactions affects the outcome.

      The reviewer makes several important points. Local excitation combined with global inhibition is the archetypical architecture for continuous attractors (see for example Knierim and Zhang, Annual review of neuroscience, 2012). Thus, in the absence of feedforward input, we observe a bump of activity. As in all continuous attractors, this bump is not necessarily ‘persistent’ and instead is free to move along the attractor.

      We cannot prove that there is not a simpler architecture that has the same effect as our 1D or 1DL conditions, and we think that there are some interesting candidates to investigate in the future. What we now prove in new Fig. S2b-d is that it is not the strength of recurrent connections themselves, but instead the continuous attractor structure that aligns grid cells in our model. To demonstrate this, we shuffle incoming recurrent connections to each neuron in the 1D condition (while avoiding self-connections for fairness), and show that training does not lead to grid alignment. We also show in Fig. S1 that an architecture represented by 20 overlapping 1DL attractors, each formed by concatenating 10 random cells, aligns grid cells to levels slightly lower but similar to the 1D or 1DL attractors. This architecture can perhaps be considered as simpler to build in biological terms than all the others, but it is still constituted by continuous attractors.

      The strength of recurrent collaterals, or more precisely the recurrent to feedforward ratio, is crucial in our model to achieve a negotiated outcome from constraints imposed by the attractor and the inputs. We now show explicit measures of this ratio in Fig. S2, as well as examples showing that an imbalance in this ratio impairs grid alignment. When the ratio is too high or too low, both individual and population gridness are low. Interestingly, grid spacing behaves differently, decreasing monotonically with the relative strength of recurrent connections.

      (3) I did not understand what is learned from the local topology analysis. Given that all the grid cells are driven by an input from place cells that spans a 2d manifold, and that the activity in the grid cell network settles on a steady state that depends only on the inputs, isn't it quite obvious that the manifold of activity in the grid cell layer would have, locally, a 2d structure?

      The dimensionality of the input is important, although not the only determinant of the topology of the activity. The recurrent collaterals are the other determinant, and their architecture is a crucial feature. For example, as we now show in Figure S2b-d, shuffled recurrent synaptic weights fail to align grid cells. In the 1D condition, if feedforward inputs were absent, the dynamics of the activity would be confined to a ring. The opposite condition is our ‘no attractor’ condition, in which activity in the grid cell layer mimics the topology of inputs, a 2D sheet (and not a torus). It is in the intermediate range, when both feedforward and recurrent inputs are important, that a negotiated solution (a torus) is achieved.

      The analyses of local dimensionality and local homology of Figure 3 are crucial steps to demonstrate toroidal topology. According to the theorem of classification of closed surfaces, global homology is not enough to univocally define the topology of a point cloud, and thus this step cannot be skipped. The step is aimed to prove that the point cloud is indeed a closed surface.

      (4) The modeling is all done in planar 2d environments, where the feedforward learning mechanism promotes the emergence of a hexagonal pattern in the single neuron tuning curve. This, combined with the fact that all neurons develop spatial patterns with the same spacing and orientation, implies even without any topological analysis that the emerging topology of the population activity is a torus.

      We cannot agree with this intuition. In the ‘no attractor’ condition, individual maps have hexagonal symmetry with standardized spacing, but given the lack of alignment the population activity is not a closed surface and thus not a torus. It can rather be described as a 2D sheet embedded in a high dimensional space, a description that also applies to the input space.

      While it is rather evident that an ad hoc toroidal architecture folds this 2D population activity into a torus, it is less evident and rather surprising that 1D architectures have the same capability. This is the main novelty in our work.

      (5) Moreover, the recent work of Gardner et al. demonstrated much more than the preservation of the topology in the different environments and in sleep: the toroidal tuning curves of individual neurons remained the same in different environments. Previous works, that analyzed pairwise correlations under hippocampal inactivation and various other manipulations, also pointed towards the same conclusion. Thus, the same population activity patterns are expressed in many different conditions. In the present model, the results of Figure 6 suggest that even across distinct rectangular environments, toroidal tuning curves will not be preserved, because there are multiple possible arrangements of the phases on the torus which emerge in different simulations.

      We agree with the reviewer in the main point, although the recently found ring activity in the absence of sensory feedback (Gonzalo Cogno et al, 2023) suggests that what is happening in the EC is more nuanced than a pre-wired torus. Solutions in Figure 6 are different ways of folding a 1D strip into a torus, with or without the condition of periodicity in the 1D strip. Whether or not these different solutions would be discernible from one another in a practical setup is not clear to us. For example, global homology, as addressed in the Gardner paper, is the same for all these solutions. Furthermore, while our solutions of up to order 3 are highly discernable, higher order solutions, potentially achievable with other network parameters, would be impossible to discern by eye in representations similar to the ones in Figure 6. In addition, while we chose to keep our model in the simplest possible form as a clear proof of principle, new elements introduced to the model such as head directionality could break the symmetry and lead to the prevalence of one preferred solution for all simulation replicates. We plan to investigate this possibility in the future when attempting to incorporate path-integration capabilities to the model.

      (6) In real grid cells, there is a dense and fairly uniform representation of all phases (see the toroidal tuning of grid cells measured by Gardner et al). Here the distribution of phases is not shown, but Figure 7 suggests that phases are non uniformly represented, with significant clustering around a few discrete phases. This, I believe, is also the origin for the difficulty in identifying the toroidal topology based on the transpose of the matrix M: vectors representing the spatial response patterns of individual neurons are localized near the clusters, and there are only a few of them that represent other phases. Therefore, there is no dense coverage of the toroidal manifold that would exist if all phases were represented equally. This is not just a technical issue, however: there appears to be a mismatch between the results of the model and the experimental reality, in terms of the phase coverage.

      As mentioned in the results section, Figure 7 is meant for visualization purposes only, and serves more as cautionary tale regarding the imprevisible risks of non-linear dimensionality reduction than as a proof of the organization of activity in the network. Isomap is a non-linear transformation that deforms each of our solutions in a unique way so that, while all have the topology of a torus embedded in a high dimensional space, only a few of them exhibited one of two possible toroidal visualizations in a 3D Isomap reduction. Isomap, as well as all other popular dimensionality reduction techniques, provide no guarantee of topology invariance. A better argument to judge the homogenous distribution of phases is persistent homology, which identifies relatively large holes (compared to the sampling spacing) in the original manifold embedded in a high dimensional space. In our case, persistent homology identified only two holes significantly larger than noise (the two cycles of a torus) and one cavity in all conditions that included attractors. Regarding the specific distribution of phases in different conditions, however, see our reply below.

      (7) The manuscript makes several strong claims that incorrectly represent the relation between experimental data and attractor models, on one hand, and the present model on the other hand. For the latter, see the comments above. For the former, I provide a detailed list in the recommendations to the authors, but in short: the paper claims that attractor models induce rigidness in the neural activity which is incompatible with distortions seen in the spatial response patterns of grid cells. However, this claim seems to confuse distortions in the spatial response pattern, which are fully compatible with the attractor model, with distortions in the population activity patterns, which would be incompatible with the attractor model. The attractor model has withstood numerous tests showing that the population activity manifold is rigidly preserved across conditions - a strong prediction (which is not made, as far as I can see, by feedforward models). I am not aware of any data set where distortions of the population activity manifold have been identified, and the preservation has been demonstrated in many examples where the spatial response pattern is disrupted. This is the main point of two papers cited in the present manuscript: by Yoon et al, and Gardner et al.

      First of all, we would like to note that our model is a continuous attractor model. Different attractor models have different outcomes, and one of the main conclusions of our manuscript is that attractors can do a wider range of operations than previously thought.

      We agree with the reviewer that distortions in spatial activity (which speak against a purely path-integration guided attractor) should not be confused with distortions in the topology of the population activity (which would instead speak against the attractor dynamics itself). We have rephrased these observations in the manuscript. In fact, we believe that the capacity of grid cells to present distorted maps without a distortion of the population activity topology, as shown for example by Gardner and colleagues, could result from a tension between feedforward and recurrent inputs, the potential equilibriums of which our manuscript aims to characterize.

      (8) There is also some weakness in the mathematical description of the dynamics. Mathematical equations are formulated in discrete time steps, without a clear interpretation in terms of biophysically relevant time scales. It appears that there are no terms in the dynamics associated with an intrinsic time scale of the neurons or the synapses, and this introduces a difficulty in interpreting synaptic weights as being weak or strong. As mentioned above, the nature of the recurrent dynamics within the grid cell network (whether it exhibits continuous attractor behavior) is not sufficiently clear.

      We agree with the reviewer that our model is rather simple, and we value the extent to which this simplicity allows for a deep characterization. All models are simplifications and the best model in any given setup is the one with the minimum amount of complexity necessary to describe the phenomenon under study. We believe that to understand whether or not a 1D continuous attractor architecture can result in a toroidal population activity, a biophysically detailed model, with prohibitive computational costs, would have been unnecessarily complex. This argument does not intend to demerit biophysically detailed models, which are capable of addressing a wider range of questions regarding, for example, the spiking dynamics of grid cells, which cannot be addressed by our simple model.

      Reviewer #3 (Recommendations For The Authors):

      The work points to an interesting scenario for the emergence of toroidal topology, but the interpretation of this idea should be more nuanced. I recommend reconsidering the claims about limitations of the attractor theory, and acknowledging the limitations of the present theory.

      I don't see the limitations mentioned above as a reason to reject the ideas proposed in this manuscript, for two main reasons: first, additional research might reveal a regime of parameters where some issues can be resolved (e.g. the clustering of phases). In addition, the mechanism described here might act at an early stage in development to set up initial dynamics along a toroidal manifold, while other mechanisms might be responsible for the rigidity of the toroidal manifold in an adult animal. But all this implies that the novelty in the present manuscript is weaker than implied, the ability to explain experimental observations is more limited than implied, and these limitations should be acknowledged and discussed.

      I recommend reporting on the distribution of grid cell phases and, if indeed clustered, this should be discussed. It will be helpful to explore whether this is the reason for the difficulty in identifying the toroidal topology based on the collection of spatial response patterns (using the transpose of the matrix M).

      Ideally, a more complete work would also explore in a more systematic and parametric way the influence of the recurrent connectivity's strength on the learning, and whether a toroidal manifold emerges also in non-planar, such as the wagon-wheel environment studied in Gardner et al.

      Part of these recommendations have been addressed in the previous points (public review). Regarding the reason why the transpose of M does not fully recapitulate architecture with our conservative classification criteria, we believe that there is no reason why it should in the first place. We view the fact that the transpose of M recapitulates some features of the architecture as a purely phenomenological observation, and we think it is important as a proof that M is not exactly the same for the different conditions. We imagined that if M matrices were exactly the same this could be due to poor spatial sampling by our bins. Knowing that they are intrinsically different is important even if the reason why they have these specific features is not fully clear to us.

      Although we do not think that the distribution of phases is related to the absence of a cavity in the transpose of M or to the four clusters found in Isomap projections, it remains an interesting question that we did not explore initially. We are now showing examples of the distribution of phases in Figure S1. We observed that in both 2D and 1D conditions phases are distributed following rather regular patterns. Whether or not these patterns are compatible with experimental observations of phase distribution is to our view debatable, given that so far state-of-the-art techniques have only allowed to simultaneously record a small fraction of the neurons belonging to a given module. This said, we think that it is important to note that ordered phase patterns are an anecdotal outcome of our simulations rather than a necessary outcome of flexible attractors or attractors in general. To prove this point, we simulated a condition with a new architecture represented by the overlap of 20 short 1DL attractors, each recruiting 10 random neurons from the pool of 100 available ones.

      The rest of the parameters of the simulations were identical to those in the other conditions.

      By definition, the topology of this architecture has Betti numbers [20,0,0]. We show in Figure S1 that this architecture aligns grid cells, with individual and population gridness reaching slightly lower levels compared to the 1D condition. However, the distribution of phases of these grid cells has no discernible pattern. This result is an arbitrary example that serves as a proof-of-principle to show that flexible attractors can align grid cells without exhibiting ordered phases, not a full characterization of the outcome of this type of architecture, which we leave for future work. For the rest of our work, we stick to the simplest versions of 1D architectures, which allow for a more in-depth characterization.

      The wagon-wheel is an interesting case in which maps loose hexagonal symmetry although the population activity lies in a torus, perhaps evidencing the tension between feedforward and recurrent inputs and suggesting that grid cell response does not obey the single master of path integration. If we modeled it with a 1D attractor, we believe the outcome would strongly depend on virtual rat trajectory. If the trajectory was strictly linear, the population activity would be locally one-dimensional and potentially represented by a ring. Instead, if the trajectory allowed for turns, i.e. a 2D trajectory within a corridor-like maze, the population activity would be toroidal as in our open field simulations, while maps would not have perfect hexagonal symmetry, mimicking experimental results.

      More minor comments:

      Recurrent dynamics are modeled as if there is no intrinsic synaptic or membrane time constant. This may be acceptable for addressing the goals of this paper, but it is a bit unusual and it will be helpful to explain and justify this choice.

      As mentioned above, we believe that the best model in a given setup is the one with the lowest number of complexities that can still address the phenomenon under study. One does not use general relativity to build a bridge, although it provides a ‘more accurate’ description of the physics involved. All models are simplifications, and the more complex a model, the more it has to be taken as a black box.

      The Introduction mentions that in most models interaction between co-modular neurons occurs through direct excitatory communication, but in quite a few models the interaction is inhibitory. The crucial feature is that the interaction is strongly inhibitory between neurons that differ in their tuning, and either less inhibitory or excitatory between neurons with similar phases.

      We agree that directed inhibition has been shown to be as efficient as directed excitation, and we have modified the introduction to reflect this.

      The Discussion claims that the present work is the first one in which the topology of the recurrent architecture differs from the topology of the emergent state space. However, early works on attractor models of grid cells showed how neural connectivity which is arranged on a 2d plane, without any periodic boundary conditions, leads to a state space that exhibits the toroidal topology. Therefore, this claim should be revised.

      We agree, although the 2D sheet in this case acts as a piece of the torus, and locally the input space and architecture are identical objects. It could be argued that architectures that represent a 2D local slice of the torus, the whole torus, or several cycles around the torus form a continuous family parametrized by the extension of recurrent connections, and as a consequence it is not surprising that these works have not made claims about the incongruence between architecture and representation topologies. The 2D sheet connectivity is still constructed ad hoc to organize activity in a 2D bump, and there is no negotiation between disparate constraints because locally the constraints imposed by input and architecture are the same. We believe this situation is conceptually different from our flexible 1D attractors. We have adapted our claim to include this technical nuance.

      Why are neural responses in the perimeter of the environment excluded from the topological analysis? The whole point of the toroidal manifold analysis on real experimental data is that the toroidal manifold is preserved regardless of the animal's location and behavioral condition.

      We agree, although experimental data needs to go through extensive pre-processing such as dimensionality reduction before showing a toroidal topology. Such manipulations might smooth away the specific effects of boundaries on maps, together with other sources of noise. In our case, the original reason to downsample the dataset is related to the explosion in computational time that we experience with the ripser package when using more than ~1000 data points. For a proof-of-principle characterization we were much more interested in what happened in the center of the arena, where a 1D attractor could fold itself to confine population activity into a torus. The area we chose was sufficiently large to contain the whole torus. Borders do affect the way the attractor folds (they also affect grid maps in real rats). We feel that these imperfections could be interesting to study in relation to the parameters controlling how our virtual rat behaves at the borders, but not at this proof-of-principle stage.

      The periodic activity observed in Ref. 29 could in principle provide the basis for the ring arrangement of neurons. However, it is not yet clear whether grid cells participate in this periodic activity.

      We agree. So far it seems that entorhinal cells in general participate in the ring, which would imply that all kinds of cells are involved. However, it could well be that only some functional types participate in the ring and grid cells specifically do not, as future experiments will tell.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable work explores death coding data to understand the impact of COVID-19 on cancer mortality. The work provides solid evidence that deaths with cancer as a contributing cause were not above what would be expected during pandemic waves, suggesting that cancer did not strongly increase the risk of dying of COVID-19. These results are an interesting exploration into the coding of causes of death that can be used to make sense of how deaths are coded during a pandemic in the presence of other underlying diseases, such as cancer.

      We thank the editor and reviewers for the time they took to review our manuscript and for the thoughtful suggestions they provided. We have completed several revisions based on their feedback and we feel our paper is stronger as a result. However, none of these revisions change the overall conclusions of our study.

      Reviewer #1 (Public Review):

      Summary:

      In the paper "Disentangling the relationship between cancer mortality and COVID-19", the authors study whether the number of deaths in cancer patients in the USA went up or down during the first year (2020) of the COVID-19 pandemic. They found that the number of deaths with cancer mentioned on the death certificate went up, but only moderately. In fact, the excess with-cancer mortality was smaller than expected if cancer had no influence on the COVID mortality rate and all cancer patients got COVID with the same frequency as in the general population. The authors conclude that the data show no evidence of cancer being a risk factor for COVID and that the cancer patients were likely actively shielding themselves from COVID infections.

      Strengths:

      The paper studies an important topic and uses sound statistical and modeling methodology. It analyzes both, deaths with cancer listed as the primary cause of death, as well as deaths with cancer listed as one of the contributing causes. The authors argue, correctly, that the latter is a more important and reliable indicator to study relationships between cancer and COVID. The authors supplement their US-wide analysis by analysing three states separately.

      Weaknesses:

      The main findings of the paper can be summarized as six numbers. Nationally, in 2022, multiple-cause cancer deaths went up by 2%, Alzheimer's deaths by 31%, and diabetes deaths by 39%. At the same time, assuming no relationship between these diseases and either Covid infection risk or Covid mortality risk, the deaths should have gone up by 7%, 46%, and 28%. The authors focus on cancer deaths and as 2% < 7%, conclude that cancer is not a risk factor for COVID and that cancer patients must have "shielded" themselves against Covid infections.

      However, I did not find any discussion of the other two diseases. For diabetes, the observed excess was 39% instead of "predicted by the null model" 28%. I assume this should be interpreted as diabetes being a risk factor for Covid deaths. I think this should be spelled out, and also compared to existing estimates of increased Covid IFR associated with diabetes.

      And what about Alzheimer's? Why was the observed excess 31% vs the predicted 46%? Is this also a shielding effect? Does the spring wave in NY provide some evidence here? Why/how would Alzheimer's patients be shielded? In any case, this needs to be discussed and currently, it is not.

      We thank the reviewer for their positive feedback on the paper and for these suggestions. It is true that we have emphasized the impact on cancer deaths, as this was the primary aim of the paper. In the revised version, we have expanded the results and discussion sections to more fully describe the other chronic conditions we used as comparators (lines 267-284;346 – 386).

      Note that we are somewhat reluctant to designate any of these conditions as risk factors based solely on comparing the time series model with the demographic model of our expectations. As we mention in the discussion, there is considerable uncertainty around estimates from the demographic model in terms of the size of the population-at-risk, the mean age of the population-at-risk, and the COVID-19 infection rates and infection fatality ratios. Our demographic model is primarily used to demonstrate the effects of competing risks across types of cancers and chronic conditions, since these findings are robust to model assumptions. In contrast, the demographic model should be used with caution if the goal is to titrate the level of these risk factors (as the level of imputed risk is dependent on model assumptions). In the updated version of the manuscript, we have included uncertainty intervals in Table 3, using the upper and lower bounds of the estimated infection rates and IFRs, to better represent this uncertainty. We have also discussed this uncertainty more explicitly in the text and ran sensitivity analyses with different infection rate assumptions in the discussion (lines 354-362; 367 -370).

      We would like to note that rather than interpreting the absolute results, we used this demographic model as a tool to understand the relative differences between these conditions. From the demographic model we determined that we would expect to see much higher mortality in diabetes and Alzheimer’s deaths compared to cancer deaths due to three factors (1. Size of population-at-risk, 2. Mean age of the population-at-risk, 3. Baseline risk of mortality from the condition), that are separate from the COVID-19 associated IFR. And in general, this is what we observed.

      In comparing the results from the demographic model to the observed excess, diabetes does standout as an outlier from cancer and Alzheimer’s disease in that the observed excess is consistently above the null hypothesis which does lend support to the conclusion that diabetes is in fact a risk factor for COVID-19. A conclusion which is also supported by many other studies. Our findings for hematological cancers are also similar, in that we find consistent support for this condition being a risk factor. We have commented on this in the discussion and added a few references (lines 346-354; 395-403).

      Our hypothesis regarding non-hematological cancer deaths (lower than anticipated mortality due to shielding) could also apply to Alzheimer’s deaths. Furthermore, we used the COVID-19 attack rate for individuals >65 years (based on the data that is available), but we estimate that the mean age of Alzheimer’s patients is actually 80-81 years, so this attack rate may in fact be a bit too high, which would increase our expected excess. We have commented on this in the discussion (lines 363-377).

      Reviewer #2 (Public Review):

      The article is very well written, and the approach is quite novel. I have two major methodological comments, that if addressed will add to the robustness of the results.

      (1) Model for estimating expected mortality. There is a large literature using a different model to predict expected mortality during the pandemic. Different models come with different caveats, see the example of the WHO estimates in Germany and the performance of splines (Msemburi et al Nature 2023 and Ferenci BMC Medical Research Methodology 2023). In addition, it is a common practice to include covariates to help the predictions (e.g., temperature and national holidays, see Kontis et al Nature Medicine 2020). Last, fitting the model-independent for each region, neglects potential correlation patterns in the neighbouring regions, see Blangiardo et al 2020 PlosONE.

      Thank you for these comments and suggestions. We agree there are a range of methods that can be used for this type of analysis, and they all come with their strengths, weaknesses, and caveats. Broadly, the approach we chose was to fit the data before the pandemic (2014-2019), and project forward into 2020. To our knowledge it is not a best practice to use an interpolating spline function to extrapolate to future years. This is demonstrated by the WHO estimates in Germany in the paper you mention. This was our motivation for using polynomial and harmonic terms.

      Based on the above:

      a. I believe that the authors need to run a cross-validation to justify model performance. I would suggest training the data leaving out the last year for which they have mortality and assessing how the model predicts forward. Important metrics for the prediction performance include mean square error and coverage probability, see Konstantinoudis et al Nature Communications 2023. The authors need to provide metrics for all regions and health outcomes.

      Thank you for this suggestion. We agree that our paper could be strengthened by including cross validation metrics to justify model performance. Based on this suggestion, and your observations regarding Alzheimer’s disease, we have done two things. First, for the full pre-pandemic period (2014-2019) for each chronic condition and location we tested three different models with different degree polynomials (1. linear only, 2. linear + second degree polynomial, 3. linear + second degree polynomial + third degree polynomial) and used AIC to select the best model for each condition and location. Next, also in response to your suggestion, we estimated coverage statistics. Using the best fit model from the previous step, we then fit the model to data from 2014-2018 only and used the model to predict the 2019 data. We calculated the coverage probability as the proportion of weekly observed data points that fell within the 95% prediction interval. For all causes of death and locations the coverage probability was 100% (with the exception of multiple cause kidney disease in California, which is only shown in the appendix). The methods and results have been updated to reflect this change and we have added a figure to the appendix showing the selected model and coverage probability for each cause of death and location (lines 504 – 519; 847-859; Appendix 1- Figure 11).

      b. In the context of validating the estimates, I think the authors need to carefully address the Alzheimer case, see Figure 2. It seems that the long-term trends pick an inverse U-shape relationship which could be an overfit. In general, polynomials tend to overfit (in this case the authors use a polynomial of second degree).It would be interesting to see how the results change if they also include a cubic term in a sensitivity analysis.

      Thank you for this observation. Based on the changes described above, the model for Alzheimer’s disease now includes a cubic term in the national data and in Texas and California. The model with the second-degree polynomial remained the best fit for New York (Appendix 1 – Figure 11).

      c. The authors can help with the predictions using temperature and national holidays, but if they show in the cross-validation that the model performs adequately, this would be fine.

      At the scale of the US, adding temperature or environmental covariates is difficult and few US-wide models do so (see Goldstein 2012 and Quandelacy 2014 for examples from influenza). Furthermore, because we are looking at chronic disease outcomes, it is unclear that viral covariates or national holidays would drive these outcomes in the same way as they would if we were looking at mortality outcomes more directly related to transmissible diseases (such as respiratory mortality). Our cross validation also indicates that our models fit well without these additional covariates.

      d. It would be nice to see a model across the US, accounting for geography and spatial correlation. If the authors don't want to fit conditional autoregressive models in the Bayesian framework, they could just use a random intercept per region.

      We think the reviewer is mistaken here about the scale of our national analysis. Our national analysis did not fit independent models for each state or region. Rather, we fit a single model to the weekly-level national mortality data where counts for the whole of the US have been aggregated. We have clarified in the text (lines 156, 464). As such, we do not feel a model accounting for spatial correlation would be appropriate nor would we be able to include a random intercept for each region. We did fit three states independently (NY, TX, CA), but these states are very geographically distant from each other and unlikely to be correlated. These states were chosen in part because of their large population sizes, yet even in these states, confidence intervals were very wide for certain causes of death. Fitting models to each of the 50 US states, most of which are smaller than those chosen here, would exacerbate this issue.

      (2) I think the demographic model needs further elaboration. It would be nice to show more details, the mathematical formula of this model in the supplement, and explain the assumptions

      Thank you for this comment. We have added additional details on the demographic model to the methods. We have also extended this analysis to each state to further strengthen our conclusions (lines 548-590).

      Reviewing Editor Recommendations:

      I think that perhaps something that is missing is that the authors never make their underlying assumption explicit: they are assuming that if cancer increases the risk of dying of COVID-19, this would be reflected in the data on multiple causes of death where cancer would be listed as one of the multiple causes rather than as the underlying cause, and that their conclusions are predicated on this assumption. I would suggest explicitly stating this assumption, as opposed to other reasons why cancer mortality would increase (ex. if cancer care worsened during pandemic waves leading to poorer cancer survival).

      Response: Thank you for this suggestion. We have added a few sentences to the introduction to make this assumption clear (lines 106-112).

      Reviewer #1 (Recommendations For The Authors):

      - It could make sense to add "in the United States" into the title, as the paper only analyses US data.

      - It may make sense to reformulate the title from "disentangling the relationship..." into something that conveys the actual findings, e.g. "Lack of excess cancer mortality during Covid-19 pandemic" or something similar. Currently, the title tells nothing about the findings.

      Thank you for these suggestions. We have added “in the US” to the title. However, we feel that our findings are a bit more subtle than the suggested reformulation would imply, and we prefer to leave it in its current form.

      - Abstract, lines 42--45: This is the main finding of the paper, but I feel it is simplified too strongly in the abstract. Your simulations do *not* "largely explain" excess mortality with cancer; they give higher numbers! Which you interpret as "shielding" etc., but this is completely absent from the abstract. This sentence makes the impression that you got a good fit between simulated excess and real excess, which I would say is not the case.

      Thank you for this comment. We have rephrased the sentence in the abstract to better reflect our intentions for using the demographic model (lines 46-49). As stated above, the purpose of the demographic model was not to give a good fit with the observed excess mortality. Rather, we used the demographic model as a tool to understand the relative differences between these conditions in terms of expected excess mortality given the size, age-distribution, and underlying risk of death from the condition itself, assuming similar IFR and attack rates. And based on this, we conclude that it is not necessarily surprising that we see higher excess mortality for diabetes and Alzheimer’s compared to cancer.

      - Results line 237: you write that it's "more consistent with the null hypothesis", however clearly it is *not* consistent with the null hypothesis either (because 2% < 7%). You discuss in the Discussion that it may be due to shielding, but it would be good to have at least one sentence about it already here in the Results, and refer to the Discussion.

      We have mentioned this in the results and refer to the discussion (lines 277-278).

      - Results line 239: why was it closer to the assumption of relative risk 2? If I understand correctly, your model prediction for risk=1 was 7% and for risk=2 it was 13%. In NY you observed 8% (line 187). How is this closer to risk=2?

      Thank you for this observation. We have updated the demographic model with new data, extended the model to state-level data, and included confidence intervals on these estimates. We have also added additional discussion around the differences between our observations and expectations (lines 249-284).

      - Discussion line 275: "we did not expect to see large increases" -- why exactly? Please spell it out here. Was it due to the age distribution of the cancer patients? Was it due to the high cancer death risk?

      We demonstrate that it is the higher baseline risk of death for cancer that seems to be driving our low expectations for cancer excess mortality (lines 304-320). We have added this to the sentence to clarify our conclusions on this point and have added a figure to better illustrate this concept of competing risks (Figure 6).

      - Methods, line 405: perhaps it makes sense to cite some other notable papers on Covid excess mortality such as Msemburi et al Nature 2023, Karlinsky & Kobak eLife 2021, Islam et al BMJ 2021, etc.

      Thank you for mentioning this oversight. We certainly should have cited these papers and have included them in the updated version.

      - Methods line 410: why did you use a 5-week moving average? Why not fit raw weekly death counts? NB regression should be able to deal with it.

      Smoothing time series data with a moving average prior to running regression models is a very common practice. We did a sensitivity analysis using the raw data. This produced excess estimates with slightly larger confidence intervals, but does not change the overall conclusions of the paper.

      - Methods line 416: please indicate the software/library/package you used for fitting NB regression.

      We fit the NB regression using the MASS package in R version 4.3. We have added this to the methods (line 519).

      - Line 489: ORCHID -> ORCID

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      Codol et al. present a toolbox that allows simulating biomechanically realistic effectors and training Artificial Neural Networks (ANNs) to control them. The paper provides a detailed explanation of how the toolbox is structured and several examples that demonstrate its usefulness.

      Main comments:

      (1) The paper is well written and easy to follow. The schematics help in understanding how the toolbox works and the examples provide an idea of the results that the user can obtain.

      We thank the reviewer for this comment.

      (2) As I understand it, the main purpose of the paper should be to facilitate the usage of the toolbox. For this reason, I have missed a more explicit link to the actual code. As I see it, researchers will read this paper to figure out whether they can use MotorNet to simulate their experiments, and how they should proceed if they decide to use it. I'd say the paper provides an answer to the first question and assures that the toolbox is very easy to install and use. Maybe the authors could support this claim by adding "snippets" of code that show the key steps in building an actual example.

      This is an important point, which we also considered when writing this paper. We instead decided to focus on the first approach, because it is easier to illustrate the scientific use of the toolbox using code or interactive (Jupyter) notebooks than a publication format. We find the “how to proceed” aspect of the toolbox can more easily and comprehensively be covered using online, interactive tutorials. Additionally, this allows us to update these tutorials as the toolbox evolves over different versions, while it is more difficult to update a scientific article. Consequently, we explicitly avoided code snippets on the article itself. However, we appreciate that the paper would gain in clarity if this was more explicitly stated early. We have modified the paper to include a pointer to where to find tutorials online. We added this at the last paragraph of the introduction section:

      The interested reader may consult the full API documentation, including interactive tutorials on the toolbox website at https://motornet.org.

      (3) The results provided in Figures 1, 4, 5 and 6 are useful, because they provide examples of the type of things one can do with the toolbox. I have a few comments that might help improving them:

      a. The examples in Figures 1 and 5 seem a bit redundant (same effector, similar task). Maybe the authors could show an example with a different effector or task? (see point 4).

      The effectors from figures 1 and 5 are indeed very similar. However, the tasks in figure 1 and 5 present some important differences. The training procedure in figure 1 never includes any perturbations, while the one from figure 5 includes a wide range of perturbations of different magnitudes, timing and directions. The evaluation procedure of figure 1 includes center-out reaches with permanent viscous (proportional to velocity) external dynamics, while that of figure 5 are fixed, transient, square-shaped perturbation orthogonal to the reach direction. Finally, the networks in figure 1 undergo a second training procedure after evaluation while the network of figure 5 do not.

      While we agree that some variation of effectors would be beneficial, we do show examples of a point-mass effector in figure 6. Overall, figure 5 shows a task that is quite different from that of figure 1 with a similar effector, while the opposite is true for figure 6. We have modified the text to clarify this for the reader, by adding the following.

      End of 1st paragraph, section 2.4.

      Therefore, the training protocol used for this task largely differed from section 2.1 in that the networks are exposed to a wide range of mechanical perturbations with varying characteristics.

      1st paragraph of section 2.5

      […] this asymmetrical representation of PMDs during reaching movements did not occur when RNNs were trained to control an effector that lacked the geometrical properties of an arm such as illustrated in Figure 4c-e and section 2.1.

      b. I missed a discussion on the relevance of the results shown in Figure 4. The moment arms are barely mentioned outside section 2.3. Are these results new? How can they help with motor control research?

      We thank the reviewer for this comment. This relates to a point from reviewer 2 indicating that the purpose of each section was sometimes difficult to grasp as one reads. Section 2.3 explains the biomechanical properties that the toolbox implements to improve realism of the effector. They are not new results in the sense that other toolboxes implement these features (though not in differentiable formats) and these properties of biological muscles are empirically well-established. However, they are important to understand what the toolbox provides, and consequently what constraints networks must accommodate to learn efficient control policies. An example of this is the results in figure 6, where a simple effector versus a more biomechanically complex effector will yield different neural representations.

      Regarding the manuscript itself, we agree that more clarity on the goal of every paragraph may improve the reader’s experience. Consequently, we ensured to specify such goals at the start of each section. Particularly, we clarify the purpose of section 2.3 by adding several sentences on this at the end of the first paragraph in that section. We also now clearly state the purpose of section 2.3 with the results of figure 6 and reference figure 4 in that section.

      c. The results in Figure 6 are important, since one key asset of ANNs is that they provide access to the activity of the whole population of units that produces a given behavior. For this reason, I think it would be interesting to show the actual "empirical observations" that the results shown in Fig. 6 are replicating, hence allowing a direct comparison between the results obtained for biological and simulated neurons.

      These empirical observations are available from previous electrophysiological and modelling work. Particularly, polar histograms across reaching directions like panel C are displayed in figures 2 and 3 of Scott, Gribble, Graham, Cabel (2001, Nature). Colormaps of modelled unit activity across time and reaching directions like panel F are also displayed in figure 2 of Lillicrap, Scott (2013, Neuron). Electrophysiological recordings of M1 neurons during a similar task in non-human primates can also be seen on “Preserved neural population dynamics across animals performing similar behaviour” figure 2 B (https://doi.org/10.1101/2022.09.26.509498) and “Nonlinear manifolds underlie neural population activity during behaviour” figure 2 B as well (https://doi.org/10.1101/2023.07.18.549575). Note that these two pre-prints use the same dataset.

      We have added these citations to the text and made it explicit that they contain visualizations of similar modelling and empirical data for comparison:

      This heterogeneous set of responses matches empirical observations in non-human primate primary motor cortex recordings (Churchland & Shenoy, 2007; Michaels et al., 2016) and replicate similar visualizations from previously published work (Fortunato et al., 2023; Lillicrap & Scott, 2013; Safaie et al., 2023).

      (4) All examples in the paper use the arm26 plant as effector. Although the authors say that "users can easily declare their own custom-made effector and task objects if desired by subclassing the base Plant and Task class, respectively", this does not sound straightforward. Table 1 does not really clarify how to do it. Maybe an example that shows the actual code (see point 2) that creates a new plant (e.g. the 3-joint arm in Figure 7) would be useful.

      Subclassing is a Python process more than a MotorNet process, as python is an object-oriented language. Therefore, there are many Python tutorials on subclassing in the general sense that would be beneficial for that purpose. We have amended the main text to ensure that this is clearer to the reader.

      Subclassing a MotorNet object, in a more specific sense, requires overwriting some methods from the base MotorNet classes (e.g., Effector or Environment classes, which correspond to the original Plant and Task object, respectively). Since we made the decision (mentioned above) to not include code in the main text, we added tutorials to the online documentation, which include dedicated tutorials for MotorNet class subclassing. For instance, this tutorial showcases how to subclass Environment classes:

      https://colab.research.google.com/github/OlivierCodol/MotorNet/blob/master/examples/3-environments.ipynb

      (5) One potential limitation of the toolbox is that it is based on Tensorflow, when the field of Computational Neuroscience seems to be, or at least that's my impression, transitioning to pyTorch. How easy would it be to translate MotorNet to pyTorch? Maybe the authors could comment on this in the discussion.

      We have received a significant amount of feedback asking for a PyTorch implementation of the toolbox. Consequently, we decided to enact this, and the next version of the toolbox will be exclusively in PyTorch. We will maintain the Application Programming Interface (API) and tutorial documentation for the TensorFlow version of the toolbox on the online website. However, going forward we will focus exclusively on bug-fixing and expanding from the latest version of MotorNet, which will be in PyTorch. We now believe that the greater popularity of PyTorch in the academic community makes that choice more sustainable while helping a greater proportion of research projects.

      These changes led to a significant alteration of the MotorNet structure, which are reflected by changes made throughout the manuscript, notably in Figure 3 and Table 1.

      (6) Supervised learning (SL) is widely used in Systems Neuroscience, especially because it is faster than reinforcement learning (RL). Thus providing the possibility of training the ANNs with SL is an important asset of the toolbox. However, SL is not always ideal, especially when the optimal strategy is not known or when there are different alternative strategies and we want to know which is the one preferred by the subject. For instance, would it be possible to implement a setup in which the ANN has to choose between 2 different paths to reach a target? (e.g. Kaufman et al. 2015 eLife). In such a scenario, RL seems to be a more natural option Would it be easy to extend MotorNet so it allows training with RL? Maybe the authors could comment on this in the discussion.

      The new implementation of MotorNet that relies on PyTorch is already standardized to use an API that is compatible with Gymnasium. Gymnasium is a standard and popular interfacing toolbox used to link RL agents to environments. It is very well-documented and widely used, which will ensure that users who wish to employ RL to control MotorNet environments will be able to do so relatively effortlessly. We have added this point to accurately reflect the updated implementation, so users are aware that it is now a feature of the toolbox (new section 3.2.4.).

      Impact:

      MotorNet aims at simplifying the process of simulating complex experimental setups to rapidly test hypotheses about how the brain produces a specific movement. By providing an end-to-end pipeline to train ANNs on the simulated setup, it can greatly help guide experimenters to decide where to focus their experimental efforts.

      Additional context:

      Being the main result a toolbox, the paper is complemented by a GitHub repository and a documentation webpage. Both the repository and the webpage are well organized and easy to navigate. The webpage walks the user through the installation of the toolbox and the building of the effectors and the ANNs.

      Reviewer #2 (Public Review):

      MotorNet aims to provide a unified interface where the trained RNN controller exists within the same TensorFlow environment as the end effectors being controlled. This architecture provides a much simpler interface for the researcher to develop and iterate through computational hypotheses. In addition, the authors have built a set of biomechanically realistic end effectors (e.g., an 2 joint arm model with realistic muscles) within TensorFlow that are fully differentiable.

      MotorNet will prove a highly useful starting point for researchers interested in exploring the challenges of controlling movement with realistic muscle and joint dynamics. The architecture features a conveniently modular design and the inclusion of simpler arm models provides an approachable learning curve. Other state-of-the-art simulation engines offer realistic models of muscles and multi-joint arms and afford more complex object manipulation and contact dynamics than MotorNet. However, MotorNet's approach allows for direct optimization of the controller network via gradient descent rather than reinforcement learning, which is a compromise currently required when other simulation engines (as these engines' code cannot be differentiated through).

      The paper could be reorganized to provide clearer signposts as to what role each section plays (e.g., that the explanation of the moment arms of different joint models serves to illustrate the complexity of realistic biomechanics, rather than a novel discovery/exposition of this manuscript). Also, if possible, it would be valuable if the authors could provide more insight into whether gradient descent finds qualitatively different solutions to RL or other non gradient-based methods. This would strengthen the argument that a fully differentiable plant is useful beyond improving training time / computational power required (although this is a sufficiently important rationale per se).

      We thank the reviewer for these comments. We agree that more clarity on the section goals may improve the reader’s experience and ensured this is the case throughout the manuscript. Particularly, we added the following on the first paragraph of section 2.3, for which an explicit goal was most missing:

      In this section we illustrate some of these biomechanical properties displayed by MotorNet effectors using specific examples. These properties are well-characterised in the biology and are often implemented in realistic biomechanical simulation software.

      Regarding the potential difference in solutions obtained from reinforcement or supervised learning, this would represent a non-trivial amount of work to do so conclusively and so may not be within the scope of the current article. We do appreciate however that in some situations RL may be a more fitting approach to a given task design. In relation to this point we now specify in the discussion that the new API can accommodate interfacing with reinforcement learning toolboxes for those who may want to pursue this type of policy training approach when appropriate (new section 3.2.4.).

      Reviewer #3 (Public Review):

      Artificial neural networks have developed into a new research tool across various disciplines of neuroscience. However, specifically for studying neural control of movement it was extremely difficult to train those models, as they require not only simulating the neural network, but also the body parts one is interested in studying. The authors provide a solution to this problem which is built upon one of the main software packages used for deep learning (Tensorflow). This allows them to make use of state-of-the-art tools for training neural networks.

      They show that their toolbox is able to (re-)produce several commonly studied experiments e.g., planar reaching with and without loads. The toolbox is described in sufficient detail to get an overview of the functionality and the current state of what can be done with it. Although the authors state that only a few lines of code can reproduce such an experiment, they unfortunately don't provide any source code to reproduce their results (nor is it given in the respective repository).

      The possibility of adding code snippets to the article is something we originally considered, and which aligns with comment two from reviewer one (see above). Hopefully this provides a good overview of the motivation behind our choice not to add code to the article.

      The modularity of the presented toolbox makes it easy to exchange or modify single parts of an experiment e.g., the task or the neural network used as a controller. Together with the open-source nature of the toolbox, this will facilitate sharing and reproducibility across research labs.

      I can see how this paper can enable a whole set of new studies on neural control of movement and accelerate the turnover time for new ideas or hypotheses, as stated in the first paragraph of the Discussion section. Having such a low effort to run computational experiments will be definitely beneficial for the field of neural control of movement.

      We thank the reviewer for these comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors sought to test whether anterior insular cortex neurons increase or decrease firing during fear behavior and freezing, bi-directionally control fear via separate, anatomically defined outputs. Using a fairly simple behavior where mice were exposed to tone-shock pairings, they found roughly equal populations that do indeed either increase or decrease firing during freezing. Next, they sought to test whether these distinct populations may also have distinct outputs. Using retrograde tracers they found that the anterior insular cortex contains non-overlapping neurons which project to the mediodorsal thalamus or amygdala. Mediodorsal thalamus-projecting neurons tended to cluster in deep cortical layers while amygdala-projecting neurons were primarily in more superficial layers. Stimulation of insula-thalamus projection decreased freezing behavior, and stimulation of insula-amygdala projections increased fear behavior. Given that the neurons that increased firing were located in deep layers, that thalamus projections occurred in deep layers, and that stimulation of insula-thalamus neurons decreased freezing, the authors concluded that the increased firing neurons may be thalamus projections. Similarly, given that decreased-firing neurons tended to occur in more superficial layers, that insula-amygdala projections were primarily superficial, and that insula-amygdala stimulation increased freezing behavior, authors concluded that the decreased firing cells may be amygdala projections. The study has several strengths though also some caveats.

      Strengths:

      The potential link between physiological activity, anatomy, and behavior is well laid out and is an interesting question. The activity contrast between the units that increase/decrease firing during freezing is clear.

      It is nice to see the recording of extracellular spiking activity, which provides a clear measure of neural output, whereas similar studies often use bulk calcium imaging, a signal that rarely matches real neural activity even when anatomy suggests it might (see London et al 2018 J Neuro - there are increased/decreased spiking striatal populations, but both D1 and D2 striatal neurons increase bulk calcium).

      Weaknesses:

      The link between spiking, anatomy, and behavior requires assumptions/inferences: the anatomically/genetically defined neurons which had distinct outputs and opposite behavioral effects can only be assumed the increased/decreased spiking neurons, based on the rough area of the cortical layer they were recorded.

      Yes, we are aware that we could not provide a direct link between spiking, anatomy and behavior. We have specifically noted this in the discussion section and added a possible experiment that could be carried out to provide a more direct link in a future study.

      [Lines 371-375] We would like to provide a more direct evidence between the neuronal response types and projection patterns in future studies by electrophysiologically identifying freezing-excited and freezing-inhibited aIC neurons and testing whether those neurons activates to optogenetic activation of amygdala or medial thalamus projecting aIC neurons.

      The behavior would require more control to fully support claims about the associative nature of the fear response (see Trott et al 2022 eLife) - freezing, in this case, could just as well be nonassociative. In a similar vein, fixed intertrial intervals, though common practice in the fear literature, pose a problem for neurophysiological studies. The first is that animals learn the timing of events, and the second is that neural activity is dynamic and changes over time. Thus it is very difficult to determine whether changes in neural activity are due to learning about the tone-shock contingency, timing of the task, simply occur because of time and independently of external events, or some combination of the above.

      Trott et al. (2022) stated that "...freezing was the purest reflection of associative learning." The nonassociative processes mentioned in the study were related to running and darting behaviors, which the authors argue are suppressed by associative learning. Moreover, considerable evidence from immediate postshock freezing and immediate postshock context shift studies all indicate that the freezing response is an associative (and not nonassociative) response (Fanselow, 1980 and 1986; and Landeira-Fernandez et al., 2006). Thus, our animals' freezing response to the tone CS presentation in a novel context, following three tone CS-footshock US pairings, most likely reflects associative learning. 

      Concerning the issue of fixed inter-trial intervals (ITIs), which are standard in fear conditioning studies, particularly those with few CS-US paired trials, we acknowledge the challenge in interpreting the neural correlates of behavior. However, the ITIs in our extinction study was variable and we still found neural activities that had significant correlation with freezing. The results of our extinction study, carried out with variable it is, suggest that the aIC neural activity changes measured in this study is likely due to freezing behavior associated with fear learning, not due to learning the contingencies of fixed ITIs.

      Reviewer #2 (Public Review):

      In this study, the authors aim to understand how neurons in the anterior insular cortex (insula) modulate fear behaviors. They report that the activity of a subpopulation of insula neurons is positively correlated with freezing behaviors, while the activity of another subpopulation of neurons is negatively correlated to the same freezing episodes. They then used optogenetics and showed that activation of anterior insula excitatory neurons during tones predicting a footshock increases the amount of freezing outside the tone presentation, while optogenetic inhibition had no effect. Finally, they found that two neuronal projections of the anterior insula, one to the amygdala and another to the medial thalamus, are increasing and decreasing freezing behaviors respectively. While the study contains interesting and timely findings for our understanding of the mechanisms underlying fear, some points remain to be addressed.

      We are thankful for the detailed and constructive comments by the reviewer and addressed the points. Specifically, we included possible limitations of using only male mice in the study, included two more studies about the insula as references, specified the L-ratio and isolated distance used in our study, added the ratio of putative-excitatory and putative-inhibitory neurons obtained from our study, changed the terms used to describe neuronal activity changes (freezing-excited and freezing-inhibited cells), added new analysis (Figure 2H), rearranged Figure 2 for clarity, added new histology images, and added atlas maps with viral expressions (three figure supplements).

      Reviewer #1 (Recommendations For The Authors):

      - I would suggest keeping the same y-axis for all figures that display the same data type - Figure 5D, for example.

      Thank you for the detailed suggestion. We corrected the y-axis that display the same data type to be the same for all figures.

      - In the methods, it says 30s bins were used for neural analysis (line 435). I cannot imagine doing this, and looking at the other figures, it does not look like this is the case so could you please clarify what bins, averages, etc were used for neural and behavioral analysis?

      Bin size for neural analysis varied; 30s, 5s, 1s bins were used depending on the analysis. We corrected this and specified what time bin was used for which figure in the methods.

      Bin size for neural and freezing behavior was 30s and we also added this to the methods.

      - I would not make any claims about the fear response here being associative/conditional. This would require a control group that received an equal number of tone and shock exposures, whether explicitly unpaired or random.

      The unpaired fear conditioning paradigm, unpaired tone and shock, suggested by the reviewer is well characterized not to induce fear behavior by CS (Moita et al., 2003 and Kochli et al., 2015). In addition, considerable evidence from immediate post-shock freezing and immediate post-shock context shift studies all indicate that the freezing response is an associative (and not nonassociative) response (Fanselow, 1980 and 1986; and Landeira-Fernandez et al., 2006). Thus, our animals' freezing response to the tone CS presentation in a novel context, following three tone CS-footshock US pairings, most likely reflects associative learning.

      - I appreciate the discussion about requiring some inference to conclude that anatomically defined neurons are the physiologically defined ones. This is a caveat that is fully disclosed, however, I might suggest adding to the discussion that future experiments could address this by tagging insula-thalamus or insula-amygdala neurons with antidromic (opto or even plain old electric!) stimulation. These experiments are tricky to perform, of course, but this would be required to fully close all the links between behavior, physiology, and anatomy.

      As suggested, we have included that, in a future study, we would like to elucidate a more direct link between physiology, anatomy and behaviors by optogenetically tagging the insula-thalamus/insula-amygdala neurons and identifying whether it may be a positive or a negative cell (now named the freezing-excited and freezing-inhibited cells, respectively) in the discussion.

      [Lines 371-375] We would like to provide a more direct evidence between the neuronal response types and projection patterns in future studies by electrophysiologically identifying freezing-excited and freezing-inhibited aIC neurons and testing whether those neurons activates to optogenetic activation of amygdala or medial thalamus projecting aIC neurons.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) As all experiments have been performed only in male mice, the authors need to clearly state this limit in the introduction, abstract, and title of the manuscript.

      With increasing number of readers becoming interested in the biological sex used in preclinical studies, we also feel that it should be mentioned in the beginning of the manuscript. As suggested, we explicitly wrote that we only used male mice in the title, abstract, and introduction. In addition, we discussed possible limitations of only using male mice in the discussion section as follows:

      [Lines 381-386] Another factor to consider is that we have only used male mice in this study. Although many studies report that there is no biological sex difference in cued fear conditioning (42), the main experimental paradigm used in this study, it does not mean that the underlying brain circuit mechanism would also be similar. The bidirectional fear modulation by aIC→medial thalamus or the aIC→amygdala projections may be different in female mice, as some studies report reduced cued fear extinction in females (42).

      (2) The authors are missing important publications reporting findings on the insular cortex in fear and anxiety. For example, the authors should cite studies showing that anterior insula VIP+ interneurons inhibition reduces fear memory retrieval (Ramos-Prats et al., 2022) and that posterior insula neurons are a state-dependent regulator of fear (Klein et al., 2021). Also, regarding the anterior insula to basolateral amygdala projection (aIC-BLA), the author should include recent work showing that this population encodes both negative valence and anxiogenic spaces (Nicolas et al., 2023). 

      We appreciate the detailed suggestions and we added appropriate publications in the discussion section. The anterior insula VIP+ interneuron study (Ramos-Prats et al., 2022) is interesting, but based on the evidence provided in the paper, we felt that the role of aIC VIP+ interneuron in fear conditioning is low. VIP+ interneurons in the aIC seem to be important in coding sensory stimuli, however, it’s relevance to conditioned stimuli seems to be low; overall VIP intracellular calcium activity to CS was low and did not differ between acquisition and retrieval. Also, inhibition of VIP did not influence fear acquisition. VIP inhibition during fear acquisition did reduce fear retrieval (CS only, no light stimulation), but this does not necessarily mean that VIP activity will be involved in fear memory storage or retrieval, especially because intracellular calcium activity of VIP+ neurons was low during fear conditioning and retrieval.

      Studies by Klein et al. (2021) and Nicolas et al. (2023) are integrated in the discussion section as follows.

      [Lines 297-301] Group activity of neurons in the pIC measured with fiberphotometry, interestingly, exhibited fear state dependent activity changes—decreased activity with high fear behavior and increased activity with lower fear behavior (29)—suggesting that group activity of the pIC may be involves in maintain appropriate level of fear behavior.

      [Lines 316-319] Another distinction between the aIC and pIC may be related with anxiety, as a recent study showed that group activity of aIC neurons, but not that of the pIC, increased when mice explored anxiogenic space (open arms in an elevated plus maze, center of an open field box) (32).

      (3) The authors should specify how many neurons they excluded after controlling the L-ratio and isolation distance. It is also important to specify the percentage of putative excitatory and inhibitory interneurons recorded among the 11 mice based on their classification (the number of putative inhibitory interneurons in Figure 1D seems too low to be accurate).

      We use manual cluster cutting and only cut clusters that are visually well isolated. So we hardly have any neurons that are excluded after controlling for L-ratio and isolation distance. The criterion we used was L-ratio<0.3 and isolation distance>15, and we specified this in the methods as follows.

      [Lines 454-458] We only used well-isolated units (L-ratio<0.3, isolation distance>15) that were confirmed to be recorded in the aIC (conditioned group: n = 116 neurons, 11 mice; control group: n = 14 neurons, 3 mice) for the analysis (46). The mean of units used in our analysis are as follows: L-ratio = 0.09 ± 0.012, isolation distance = 44.97 ± 5.26 (expressed as mean ± standard deviation).

      As suggested, we also specified the percentage of putative excitatory and inhibitory interneurons recorded from our study in the results and methods section. The relative percentage of putative excitatory and inhibitory interneurons were similar for both the conditioned and the control groups (conditioned putative-excitatory: 93.1%, putative-inhibitory: 6.9%; control putative-excitatory: 92.9%, putative-inhibitory: 7.1%). Although the number of putative-interneurons isolated from our recordings is low that is what we obtained. Putative inhibitory neurons, probably because of their relatively smaller size, has a tendency to be underrepresented than the putative excitatory cells.

      [Lines 83-87] Of the recorded neurons, we analyzed the activity of 108 putative pyramidal neurons (93% of total isolated neurons) from 11 mice, which were distinguished from putative interneurons (n = 8 cells, 7% of total isolated neurons) based on the characteristics of their recorded action potentials (Figure 1D; see methods for details).

      [Lines 464-467] The percentage of putative excitatory neurons and putative inhibitory interneurons obtained from both groups were similar (conditioned putative-excitatory: 93.1%, putative-inhibitory: 6.9%; control putative-excitatory: 92.9%, putative-inhibitory: 7.1%).

      (4) While the use of correlation of single-unit firing frequency with freezing is interesting, classically, studies analyze the firing in comparison to the auditory cues. If the authors want to keep the correlation analysis with freezing, rather than correlations to the cues, they should rename the cells as "freezing excited" and "freezing inhibited" cells instead of positive and negative cells.

      As suggested, we used the terms “freezing-excited” and “freezing-inhibited” cells instead of positive and negative cells.

      (5) To improve clarity, Figure 2 should be reorganized to start with the representative examples before including the average of population data. Thus Panel D should be the first one. The authors should also consider including the trace of the firing rate of these representative units over time, on top of the freezing trace, as well as Pearson's r and p values for both of them. Then, the next panels should be ordered as follows: F, G, H, C, A, B, I, and finally E.

      We have rearranged Figure 2 based on the suggestions.

      (6) It is unclear why the freezing response in Figure 2 is different in current panels F, G, and H. Please clarify this point.

      It was because the freezing behaviors of slightly different population of animals were averaged. Some animals did not have positive/negative (or both) cells and only the behavior of animals with the specified cell-type were used for calculating the mean freezing response. With rearrangement of Figure 2, now we do not have plots with juxtaposed mean neuronal response-types and behavior.

      (7) Even though the peak of tone-induced firing rate change between negative and positive cells is 10s later for positive cells, the conclusion that this 'difference suggests differential circuits may regulate the activities of different neuron types in response to fear' is overstating the observation. This statement should be rephrased. Indeed, it could be the same circuits that are regulated by different inputs (glutamatergic, GABA, or neuromodulatory inputs).

      We agree and delete the statement from the manuscript.

      (8) The authors mention they did not find tone onset nor tone offset-induced responses of anterior insula neurons. It would be helpful to represent this finding in a Figure, especially, which were the criteria for a cell to be tone onset or tone offset responding.

      We added how tone-onset and tone-offset were analyzed in the methods section and added a plot of the analysis in Figure 2H.

      (9) Based on the spread of the viral expression shown in Figure 3B, it appears that the authors are activating/inhibiting insula neurons in the GI layer, whereas single-unit recordings report the electrodes were located in DI, AID, and AIV layers. The authors should provide histology maps of the viral spread for ChR2, NpHR3, and eYFP expression.

      Thank you for the excellent suggestion. Now the histological sample in Figure 3B is a sample with expression in the GI/DI/AID layers and it also has an image taken at higher resolution (x40) to show that viral vectors are expressed inside neurons. We also added histological maps with overlay of viral expression patterns of the ChR2, eYFP, and NpHR3 groups in Figure 3—figure supplement 1.

      (10) In Figure 5B, the distribution of terminals expressing ChR2 appears much denser in CM than in MD. This should be quantified across mice and if consistent with the representative image, the authors should refer to aIC-CM rather than aIC-MD terminals.

      Overall, we referred to the connection as aIC-medial thalamus, which collectively includes both the CM and the MD. Microscopes we have cannot determine whether terminals end at the CM or MD, but the aIC projections seems to pass through the CM to reach the MD. The Allen Brain Institute’s Mouse brain connectivity map (https://connectivity.brain-map.org/projection/experiment/272737914) of a B6 mouse, the mouse strain we used in our study, with tracers injected in similar location as our study also supports our speculation and shows that aIC neuronal projections terminate more in the MD than in the CM. In addition, the power of light delivered for optogenetic manipulation is greatly reduced over distance, and therefore, the MD projecting terminals which is closer to the optic fiber will be more likely to be activated than the CM projecting terminals. However, since we could not determine whether the aIC terminate at the CM or the MD, we collectively referred to the connection as the aIC-medial thalamus throughout the manuscript.

      Author response image 1.

      (11) Histological verifications for each in vivo electrophysiology, optogenetic, and tracing experiments need to include a representative image of the implantation/injection site, as well as a 40x zoom-in image focusing on the cell bodies or terminals right below the optic fiber (for optogenetic experiments). Moreover, an atlas map including all injection locations with the spread of the virus and fiber placement should be added in the Supplement Figures for each experiment (see Figure S1 Klein et al., 2021). Similarly, the authors need to add a representation of the spread of the retrograde tracers for each mouse used for this tracing experiment.

      As suggested, we added a histology sample showing electrode recording location for in-vivo electrophysiology in Figure 1 and added atlas maps for the optogenetic and tracing experiments in supplementary figures. We also provide a 40x zoom-in image of the expression pattern for the optogenetic experiments (Figure 3B).

      (12) To target anterior insula neurons, authors mention coordinates that do not reach the insula on the Paxinos atlas (AP: +1.2 mm, ML: -3.4 mm, DV: -1.8 mm). If the DV was taken from the brain surface, this has to be specified, and if the other coordinates are from Bregma, this also needs to be specified. Finally, the authors cite a review from Maren & Fanselow (1996), for the anterior insula coordinates, but it remains unclear why.

      AP and ML coordinates are measurement made in reference to the bregma. DV was calculated from the brain surface. We specified these in the Methods. We did not cite a review from Maren & Fenselow for the aIC coordinates.

      Minor comments:

      (1) A schematic of the microdrive and tetrodes, including the distance of each tetrode would also be helpful.

      We used a handcrafted Microdrives with four tetrodes. Since they were handcrafted, the relative orientation of the tetrodes varies and tetrode recording locations has to be verified histologically. We, however, made sure that the distance between tetrodes to be more than 200 μm apart so that distinct single-units will be obtained from different tetrodes. We added this to the methods as follows.

      [Lines 430-431] The distance between the tetrodes were greater than 200 μm to ensure that distinct single-units will be obtained from different tetrodes.

      (2) Figure 2E: representation of the baseline firing (3-min period before the tone presentation) is missing.

      Figure 2E is the 3 min period before tone presentation

      (3) Figure 2: Averages Pearson's correlation r and p values should be stated on panels F, G, and H (positive cell r = 0.81, P < 0.05; negative cell r = -0.68, P < 0.05).

      They were all originally stated in the figures. But with reorganization of Figure 2, we now have a plot of the Pearson’s Correlation with r and p values in Figure 2F.

      (4) Figure 2I: Representation of the absolute value of the normalized firing is highly confusing. Indeed, as the 'negative cells' are inhibited to freezing, firing should be represented as normalized, and negative for the inhibited cells.

      To avoid confusion, we did not take an absolute value of the “negative cells”, which are now called the “freezing-inhibited cells”.

      (5) Figure 4E (retrograde tracing): representation of individual values is missing.

      Figure 4E now has individual values.

      References:

      London, T. D., Licholai, J. A., Szczot, I., Ali, M. A., LeBlanc, K. H., Fobbs, W. C., & Kravitz, A. V. (2018). Coordinated ramping of dorsal striatal pathways preceding food approach and consumption. Journal of Neuroscience, 38(14), 3547-3558.

      Trott, J. M., Hoffman, A. N., Zhuravka, I., & Fanselow, M. S. (2022). Conditional and unconditional components of aversively motivated freezing, flight and darting in mice. Elife, 11, e75663.

      Fanselow, M. S. (1980). Conditional and unconditional components of post-shock freezing. The Pavlovian journal of biological science: Official Journal of the Pavlovian, 15(4), 177-182.

      Fanselow, M. S. (1986). Associative vs topographical accounts of the immediate shock-freezing deficit in rats: implications for the response selection rules governing species-specific defensive reactions. Learning and Motivation, 17(1), 16-39.

      Landeira-Fernandez, J., DeCola, J. P., Kim, J. J., & Fanselow, M. S. (2006). Immediate shock deficit in fear conditioning: effects of shock manipulations. Behavioral neuroscience, 120(4), 873.

      Moita, M. A., Rosis, S., Zhou, Y., LeDoux, J. E., & Blair, H. T. (2003). Hippocampal place cells acquire location-specific responses to the conditioned stimulus during auditory fear conditioning. Neuron, 37(3), 485-497.

      Kochli, D. E., Thompson, E. C., Fricke, E. A., Postle, A. F., & Quinn, J. J. (2015). The amygdala is critical for trace, delay, and contextual fear conditioning. Learning & memory, 22(2), 92-100.

      Ramos-Prats, A., Paradiso, E., Castaldi, F., Sadeghi, M., Mir, M. Y., Hörtnagl, H., ... & Ferraguti, F. (2022). VIP-expressing interneurons in the anterior insular cortex contribute to sensory processing to regulate adaptive behavior. Cell Reports, 39(9).

      Klein, A. S., Dolensek, N., Weiand, C., & Gogolla, N. (2021). Fear balance is maintained by bodily feedback to the insular cortex in mice. Science, 374(6570), 1010-1015.

      Nicolas, C., Ju, A., Wu, Y., Eldirdiri, H., Delcasso, S., Couderc, Y., ... & Beyeler, A. (2023). Linking emotional valence and anxiety in a mouse insula-amygdala circuit. Nature Communications, 14(1), 5073.

      Maren, S., & Fanselow, M. S. (1996). The amygdala and fear conditioning : Has the nut been cracked? Neuron, 16(2), 237‑240. https://doi.org/10.1016/s0896-6273(00)80041-0

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The main goal of the authors was to study the testis-specific role of the protein FBXO24 in the formation and function of the ribonucleoprotein granules (membraneless electron-dense structures rich in RNAs and proteins).

      We appreciate the summary comment of reviewer #1.

      Strengths:

      The wide variety of methods used to support their conclusions (including transgenic models)

      We appreciate the positive comment of reviewer #1.

      Weaknesses:

      The lack of specific antibodies against FBXO24. Some of the experiments showing a specific phenotype are descriptive and lack of logical explanation about the possible mechanism (i.e. AR or the tail structure).

      Because we could not obtain specific antibodies against FBXO24, we generated Fbxo24-FLAG transgenic mice, which can be used to show the interaction between FBXO24 and IPO5. For the mechanism of impaired acrosome reaction, we added some results and discussion as written in the response to the question (1) of reviewer #1 (public review). For the mechanism of abnormal flagellar structure, we added new results and fixed the manuscript as written in the response to the major comments of reviewer #3 (recommendations for the authors).

      Questions:

      The paper is excellent and employs a wide variety of methods to substantiate the conclusions. I have very few questions to ask:

      (1) KO mice cannot undergo acrosome reaction (AR) even spontaneously. How do you account for this, given that no visible defects were observed in the acrosome?

      One possibility is that Fbxo24 KO spermatozoa cannot undergo capacitation; however, it is difficult to analyze the capacitation status such as tyrosine phosphorylation because most Fbxo24 KO spermatozoa are not alive (Figure S3A). Other possibility is that AR-related proteins are affected in Fbxo24 KO spermatozoa. Therefore, we analyzed the amounts of AR-related proteins with mass spectrometry (Figure S3C). Although previous studies indicate that the assembly of the SNARE complex is a key event prior to AR [Hutt et al., 2005 (PMID: 15774481); Katafuchi et al., 2000 (PMID: 11066067); Schulz et al., 1997 (PMID: 9356173); Tomes et al., 2002 (PMID: 11884041)], no clear differences were detected for SNARE proteins (Figure S3C and D). PLCD4 that is important for AR [Fukami et al., 2001 (PMID: 11340203)) was also detected in Fbxo24 KO spermatozoa (Figure S3C). Although we could not find differences in the amounts of AR-related proteins, it is still possible that FER1L5, another AR-related protein [Morohoshi et al., 2023 (PMID: 36696506)] not detected in the mass spectrometry analyses, or AR-related proteins not yet identified are affected in Fbxo24 KO spermatozoa. We added these results and discussion (line 160-166 and 305-312).

      (2) KO sperm are unable to migrate in the female tract, and, more intriguingly, they do not pass through the utero-tubal junction (UTJ). The levels of ADAM3 are normal, suggesting that the phenotype is influenced by other factors. The authors should investigate the levels of Ly6K since mice also exhibit the same phenotype but with normal levels of ADAM3.

      We detected LY6K in Fbxo24 KO spermatozoa with immunoblotting, but no difference was found.

      We added the results (Figure S3E and line 172–175).

      (3) In Figure 4A, the authors assert that "RBGS Tg mice revealed that mitochondria were abnormally segmented in Fbxo24 KO spermatozoa." I am unable to discern this from the picture shown in that panel. Could you please provide a more detailed explanation or display the information more explicitly?

      We are sorry for the ambiguous explanation on the morphology of sperm mitochondria sheath. Fbxo24 KO cauda epidydimal spermatozoa shows disorganized mitochondria sheath rather than “segmented”. We fixed the sentence (line 190-192) and added white arrowheads that indicate the disorganized regions (Figure 4A).

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kaneda et al "FBXO24 ensures male fertility by preventing abnormal accumulation of membraneless granules in sperm flagella" is a significant paper on the role of FBXO24 in murine male germ cell development and sperm ultrastructure and function. The body of experimental evidence that the authors present is extraordinarily strong in both breadth and depth. The authors investigate the protein's functions in male germ cells and sperm using a wide variety of approaches but focusing predominantly on their novel mouse model featuring deletion of the Fbxo24 gene and its product. Using this mouse, and a cross of it with another model that expresses reporters in the head and midpiece, they logically build from one experiment to the next. Together, their data show that this protein is involved in the regulation of membraneless electron-dense structures; loss of FBXO24 led to an accumulation of these materials and defects in the sperm flagellum and fertilizing ability. Interestingly, the authors found that several of the best-known components of electron-dense ribonucleoprotein granules that are found in the intermitochondrial cement and chromatoid body were not disrupted in the Fbxo24 knockout, suggesting that the electron-dense material and these structures are not all the same, and the biology is more complicated than some might have thought. They found evidence for the most changes in IPO5 and KPNB1, and biochemical evidence that FBXO24 and IPO5 could interact.

      We appreciate the summary comment of reviewer #2.

      Strengths:

      The authors are to be commended for the thoroughness of their experimental approaches and the extent to which they investigated impacts on sperm function and potential biochemical mechanisms. Very briefly, they start by showing that the Fbxo24 message is present in spermatids and that the protein can interact with SKP1, in a way that is dependent on its F-box domain. This points toward a potential function in protein degradation. To test this, they next made the knockout mouse, validated it, and found the males to be sterile, although capable of plugging a female. Looking at the sperm, they identified a number of ultrastructural and morphological abnormalities, which they looked at in high resolution using TEM. They also cross their model with RBGS mice so that they have reporters in both the acrosome and mitochondria. The authors test a variety of sperm functions, including motility parameters, ability to fertilize by IVF, cumulus-free IVF, zona-free-IVF, and ICSI. They found that ICSI could rescue the knockout but not other assisted reproductive technologies. Defects in male fertility likely resulted from motility disruption and failure to get through the utero-tubal junction but defects in acrosome exocytosis also were noted. The authors performed thorough investigations including both targeted and unbiased approaches such as mass spectrometry. These enabled them to show that although the loss of the FBXO24 protein led to more RNA and elevated levels of some proteins, it did not change others that were previously identified in the electron-dense RNP material.

      The manuscript will be highly significant in the field because the exact functions of the electron-dense RNP materials have remained somewhat elusive for decades. Much progress has been made in the past 15 years but this work shows that the situation is more complex than previously recognized. The results show critical impacts of protein degradation in the differentiation process that enables sperm to change from non-descript round cells into highly polarized and compartmentalized mature sperm, with an equally highly compartmentalized flagellum. This manuscript also sets a high bar for the field in terms of how thorough it is, which reveals wide-ranging impacts on processes such as mitochondrial compaction and arrangement in the midpiece, the correct building of the major cytoskeletal elements in the flagellum, etc.

      We appreciate the positive comment of reviewer #2.

      Weaknesses:

      There are no real weaknesses in the manuscript that result from anything in the control of the authors. They attempted to rescue the knockout by expressing a FLAG-tagged Fbxo24 transgene, but that did not rescue the phenotype, either because of inappropriate levels/timing/location of expression, or because of interference by the tag. They also could not make anti-FBXO24 that worked for coimmunoprecipitation experiments, so relied on the FLAG epitope, an approach that successfully showed co-IP with IPO5 and SKP1.

      We could not rescue the phenotype with Fbxo24-FLAG transgene, but different Fbxo24 mutant mice show the same phenotypes (Figure S6G). Further, another group showed that Fbxo24 KO mice exhibited abnormal mitochondrial coiling [Li et al., 2024 (PMID: 38470475)], confirming that

      FBXO24 is involved in the mitochondrial sheath formation.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors found that FBXO24, a testis-enriched F-box protein, is indispensable for male fertility. Fbxo24 KO mice exhibited malformed sperm flagellar and compromised sperm motility.

      We appreciate the summary comment of reviewer #3.

      Strengths:

      The phenotype of Fbxo24 KO spermatozoa was well analyzed.

      We appreciate the positive comment of reviewer #3.

      Weaknesses:

      The authors observed numerous membraneless electron-dense granules in the Fbxo24 KO spermatozoa. They also showed abnormal accumulation of two importins, IPO5 and KPNB1, in the Fbxo24 KO spermatozoa. However, the data presented in the manuscript do not support the conclusion that FBXO24 ensures male fertility by preventing the abnormal accumulation of membraneless granules in sperm flagella, as indicated in the manuscript title.

      Fbxo24 KO mice showed abnormal accumulation of membraneless granules in sperm flagella and male infertility, suggesting that FBXO24 is involved in these processes, but there are no results that show the direct relationship as reviewer #3 mentioned. Therefore, we fixed the title.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      On page 4, lines 152-154, the authors introduce the RBGS mouse model and use it in their experiments.

      However, they left out an obvious but helpful sentence that tells the reader that they crossed the Fbxo24-null mouse with the RBGS. As one continues reading it is clear, but best to avoid even slight confusion.

      We revised the explanation in the result section (line 150-153).

      Reviewer #3 (Recommendations For The Authors):

      In this manuscript, the authors found that FBXO24, a testis-enriched F-box protein, is indispensable for male fertility. Fbxo24 KO mice exhibited malformed sperm flagellar and compromised sperm motility. The phenotype of Fbxo24 KO spermatozoa was well analyzed.

      The authors observed numerous membraneless electron-dense granules in the Fbxo24 KO spermatozoa. They also showed abnormal accumulation of two importins, IPO5 and KPNB1, in the Fbxo24 KO spermatozoa. However, the data presented in the manuscript do not support the conclusion that FBXO24 ensures male fertility by preventing the abnormal accumulation of membraneless granules in sperm flagella, as indicated in the manuscript title.

      Fbxo24 KO mice showed abnormal accumulation of membraneless granules in sperm flagella and male infertility, suggesting that FBXO24 is involved in these processes, but there are no results that show the direct relationship as reviewer #3 mentioned. Therefore, we fixed the title.

      Major comments:

      In the title, abstract, introduction, and some sections such as lines 275-276, the authors conclude that FBXO24 prevents the accumulation of importins and RNP granules during spermiogenesis. However, the provided data do not substantiate this claim. To provide conclusive evidence to support the current title, the authors need to present evidence supporting: 1) direct degradation of IPO5 and KPNB1 by FBXO24; 2) the direct requirement of IPO5 for the formation of the membraneless granules, and 3) infertility resulting from the presence of membraneless granules, rather than other issues such as abnormal ODF and AX.

      (1) direct degradation of IPO5 and KPNB1 by FBXO24.

      To examine if IPO5 can be degraded by FBXO24, we performed a ubiquitination assay using HEK293T cells. Ubiquitination of IPO5 was upregulated in the presence of WT FBXO24 but not with the mutant ΔF-box FBXO24, suggesting that IPO5 can be ubiquitinated by FBXO24. We did not examine the ubiquitination of KPNB1 because we failed to construct a plasmid vector expressing mouse KPNB1. We think that KPNB1 is not the substrate because we did not detect the interaction between FBXO24 and KPNB1 (Figure 5E). We added the results of the ubiquitination assay (Figure

      5F and line 261-265) and mentioned it in the abstract (line 35).

      (2) the direct requirement of IPO5 for the formation of the membraneless granules.

      (3) infertility resulting from the presence of membraneless granules, rather than other issues such as abnormal ODF and AX.

      We revealed that IPO5 aggregate under stress condition in COS7 cells (Figure 6C and D); however, we did not examine whether IPO5 is required for the formation of the membraneless granules. We consider that protein degradation systems such as PROTAC or Trim-Away to knockdown IPO5 at the protein level in Fbxo24 KO mice could be a good way to see if the membraneless granules are diminished and male fertility is rescued. However, it takes time to apply the degradation systems in vivo. Therefore, we would like to leave this rescue experiment for future studies. We fixed the title and  abstract (line 37-38), and removed the last sentence of the introduction.

      Also, the other group reported the analyses of Fbxo24 KO mice [Li et al., 2024 (PMID: 38470475)] right after we submitted our manuscript to the eLife. They reported not only disorganized flagellar structures but also abnormal head morphology, which may lead to male infertility. The differences from our study may be due to different mouse genetic backgrounds. We mentioned it in the discussion section (line 348-353).

      Minor comments:

      (1) The authors claimed a significant increase in the total amount of RNAs in Fbxo24 KO spermatozoa (lines 259-261), suggesting that the ...contain RNAs. More direct evidence supporting this claim should be provided.

      We show that the amounts of IPO5 and KBNB1 increased in Fbxo24 KO spermatozoa (Figure 5A and B), both of which could be incorporated into RNP granules in COS7 cells (Figure 6C and D), supporting the idea that membraneless electron-dense structures may be RNP granules. However, because we did not show direct evidence that electron-dense structures contain RNAs, we removed the sentences (line 259-261 of the 1st submission manuscript). 

      (2) The author should provide an explanation for the absence of a FLAG band in the input Tg in Figure 5D and the larger size of the IPO5 band in the FLAG-IP group compared to the input. Similar observations are also noted in Figure 5E.

      The FLAG band is weak because the protein amount is low. When we increase the contrast, we can see the FLAG band. We added an image with high contrast (Figure 5D). Sometimes, proteins run differently with SDS-PAGE after immunoprecipitation, likely due to varying protein composition in the sample. We explained it in the figure legend (line 868-869).

      (3) In Line 526, clarify the procedure for sperm purification, and determine the potential for contamination from somatic cells.

      We did not perform sperm purification, but when we observed spermatozoa obtained from cauda epididymis, we rarely observed either somatic cells or immature spermatogenic cells. We added  pictures in Figure S7. Further, we added detailed explanation about how to collect spermatozoa from the epididymis (line 549-550).

      (4) Define the Y-axis in Figure 2E, F, and G.

      We have revised the figures.

    1. Author response:

      Reviewer #1 (Public Review):

      Using the UK Biobank, this study assessed the value of nuclear magnetic resonance measured metabolites as predictors of progression to diabetes. The authors identified a panel of 9 circulating metabolites that improved the ability in risk prediction of progression from prediabetes to diabetes. In general, this is a well-performed study, and the findings may provide a new approach to identifying those at high risk of developing diabetes. I have some comments that may improve the importance of this study.

      We deeply appreciate the reviewer's invaluable time dedicated to the review of this manuscript and the insightful comments to enhance its overall quality.

      (1) It is unclear why the authors only considered the top 20 variables in the metabolite selection and why they did not set a wider threshold.

      Thank you for the comment. We set the top 20 variables in the metabolite selection balancing the performance of the final diabetes risk prediction model and the clinical applicability due to measurement costs. We have added this explanation in the “Methods” section.

      “We chose the intersection set of the top 20 most important variables selected by the three machine learning models, after balancing the performance of the final diabetes risk prediction model and the clinical applicability associated with measurement costs of metabolites.”

      (2) The methods section would benefit from a more detailed exposition of how parameter tuning was conducted and the range of parameters explored during the training of the RSF model.

      According to the reviewer’s suggestion, we have added a more detailed description of parameters tunning and the range of parameters explored during the training of the RSF model in the “Method S2” section in the Supplementary material.

      “The RSF model was fitted using the “randomForestSRC” package and the grid search method was used for hyperparameter tuning. Specifically, the grid search method was used to tune hyperparameters among the RSF model, through minimizing out-of-sample or out-of-bag error1. Each tree in the RSF is constructed from a random sample of the data, typically a bootstrap sample or 63.2% of the sample size (as in the present study). Consequently, not all observations are used to construct each tree. The observations that are not used in the construction of a tree are referred to as out-of-bag observations. In an RSF model, each tree is built from a different sample of the original data, so each observation is “out-of-bag” for some of the trees. The prediction for an observation can then be obtained using only those trees for which the observation was not used for the construction. A classification for each observation is obtained in this way and the error rate can be estimated from these predictions. The resulting error rate is referred to as the out-of-bag error. Through calculating the out-of-bag error in each iteration, the best hyperparameters were finally determined.

      The hyperparameters to be tuned and range of grid search in the present study were below: number of trees (50-1000, by 50), number of variables to possibly split at each node (3-6, by 1), and minimum size of terminal node (1-20, by 1)2.”

      (3) It is hard to understand the meaning of the decision curve analysis and the clinical implications behind the net benefit, which are required to clarify the application values of models.

      Thank you for the comment. We have added more description and discussion about the decision curve analysis in the “Methods” and “Discussion” sections.

      “Furthermore, we used decision curve analysis (DCA) to assess the clinical usefulness of prediction model-based guidance for prediabetes management, which calculates a clinical “net benefit” for one or more prediction models in comparison to default strategies of treating all or no patients3.”

      “Most importantly, a model with good discrimination does not necessarily have high clinical value. Hence, DCA was used to compare the clinical utility of the model before and after adding the metabolites, and this showed a higher net benefit for the latter than the basic model, suggesting the addition of the metabolites increased the clinical value of prediction, i.e., the potential benefit of guiding management in individuals with prediabetes3,4. These results provided novel evidence supporting the value of metabolic biomarkers in risk prediction and stratification for the progression from prediabetes to diabetes.”

      (4) Notably, the NMR platform utilized within the UK Biobank primarily focused on lipid species. This limitation should be discussed in the manuscript to provide context for interpreting the results and acknowledge the potential bias from the measuring platform.

      Thank you for the comment. We acknowledged this limitation that NMR platform within the UK Biobank primarily focused on lipid species and the potential bias from the measuring platform and have added this in “Discussion” section.

      “Third, the Nightingale metabolomics platform primarily focused on lipids and lipoprotein sub-fractions, and thus the predictive value of other metabolites in the progression from prediabetes to diabetes warranted further research using an untargeted metabolomics approach.”

      (5) The manuscript should explain the potential influence of non-fasting status on the findings, particularly concerning lipoprotein particles and composition. There should be a detailed discussion of how non-fasting status may impact the measurement and the findings.

      According to the reviewer’s suggestion, we have added more details to explain the potential influence of non-fasting status on our findings in the “Discussion” section.

      “Additionally, the use of non-fasting blood samples might increase inter-individual variation in metabolic biomarker concentrations, however, fasting duration has been reported to account for only a small proportion of variation in plasma metabolic biomarker concentrations5. Therefore, we believe the impact of non-fasting samples on our findings would be minor.”

      (6) Cross-platform standardization is an issue in metabolism, and further descriptions of quality control are recommended.

      Thank you for the comment. We have added more description of quality control in the “Method S1” section in the Supplementary material.

      “Metabolic biomarker profiling by Nightingale Health’s NMR platform provides consistent results over time and across spectrometers. Furthermore, the sample preparation is minimal in the Nightingale Health’s metabolic biomarker platform, circumventing all extraction steps. These aspects result in highly repeatable biomarker measurements. Pre-specified quality metrics were agreed between UK Biobank and Nightingale Health to ensure consistent results across the samples, and pilot measurements were conducted. Nightingale Health performed real-time monitoring of the measurement consistency within and between spectrometers throughout the UK Biobank samples. Two control samples provided by Nightingale Health were included in each 96-well plate for tracking the consistency across multiple spectrometers. Furthermore, two blind duplicate samples provided by the UK Biobank were included in each well plate, with the position information unlocked only after results delivery. Coefficient of variation (CV) targets across the metabolic biomarker profile were pre-specified for both Nightingale Health’s internal control samples and UK Biobank’s blind duplicates. The targets were met for each consecutively measured batch of ~25,000 samples. For the majority of the metabolic biomarkers, the CVs were below 5% (https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=3000). Further, the distributions of measured biomarkers from 5 sample batches indicated absence of batch effects (https://biobank.ctsu.ox.ac.uk/ukb/ukb/docs/nmrm_app1).”

      Reviewer #2 (Public Review):

      Deciphering the metabolic alterations characterizing the prediabetes-diabetes spectrum could provide early time windows for targeted preventive measures to extend precision medicine while avoiding disproportionate healthcare costs. The authors identified a panel of 9 circulating metabolites combined with basic clinical variables that significantly improved the prediction from prediabetes to diabetes. These findings provided insights into the integration of these metabolites into clinical and public health practice. However, the interpretation of these findings should take account of the following limitations.

      We appreciate the reviewer’s positive comments and encouragement.

      (1) First, the causal relationship between identified metabolites and diabetes or prediabetes deserves to be further examined particularly when the prediabetic status was partially defined. Some metabolites might be the results of prediabetes rather than the casual factors for progression to diabetes.

      Thank you for your insightful comments. We agree with you that the panel of metabolites in this study might not be the causal factor for progression from prediabetes to diabetes, which needs further validation in experimental studies. We have added this limitation in the “Discussion” section.

      “Fifth, we could not draw any conclusion about the causality between the identified metabolites and the risk for progression to diabetes due to the observational nature, which remained to be validated in further experimental studies.”

      (2) The blood samples were taken at random (not all in a non-fasting state) and so the findings were subjected to greater variability. This should be discussed in the limitations.

      According to the reviewer’s suggestion, we have added more details to explain the potential influence of non-fasting status on our findings in the “Discussion” section.

      “Additionally, the use of non-fasting blood samples might increase inter-individual variation in metabolic biomarker concentrations, however, fasting duration has been reported to account for only a small proportion of variation in plasma metabolic biomarker concentrations5. Therefore, we believe the impact of non-fasting samples on our findings would be minor.”

      (3) The strength of NMR in metabolic profiling compared to other techniques (i.e., mass spectrometry [MS], another commonly used metabolic profiling method) could be added in the Discussion section.

      According to the reviewer’s suggestion, we have added the strength of NMR in metabolic profiling compared to other techniques in the “Discussion” section.

      “Circulating metabolites were quantified via NMR-based metabolome profiling within the UK Biobank, which offers metabolite qualification with relatively lower costs and better reproducibility6.”

      (4) Fourth, the applied platform focuses mostly on lipid species which may be a limitation as well.

      Thank you for the comment. We acknowledged this limitation that NMR platform within the UK Biobank primarily focused on lipid species and the potential bias from the measuring platform and have added this in the “Discussion” section.

      “Third, the Nightingale metabolomics platform primarily focused on lipids and lipoprotein sub-fractions, and thus the predictive value of other metabolites in the progression from prediabetes to diabetes warranted further research using an untargeted metabolomics approach.”

      (5) it is a very large group with pre-diabetes, but the results only apply to prediabetes and not to the general population. This should be clear, although the authors have also validated the predictive value of these metabolites in the general population.

      Thank you for the comment. We agree with you that the results only apply to prediabetes and not to the general population, though they also showed potential predictive value among participants with normoglycemia. We have accordingly modified the relevant expressions in the “Conclusion” section to restrict these findings to participants with prediabetes.

      “In this large prospective study among individuals with prediabetes, we detected a panel of circulating metabolites that were associated with an increased risk of progressing to diabetes.”

      References

      (1) Janitza S, Hornung R. On the overestimation of random forest's out-of-bag error. PLoS One. 2018;13(8):e0201904.

      (2) Tian D, Yan HJ, Huang H, et al. Machine Learning-Based Prognostic Model for Patients After Lung Transplantation. JAMA Netw Open. 2023;6(5):e2312022.

      (3) Vickers AJ, van Calster B, Steyerberg EW. A simple, step-by-step guide to interpreting decision curve analysis. Diagn Progn Res. 2019;3:18.

      (4) Li J, Xi F, Yu W, Sun C, Wang X. Real-Time Prediction of Sepsis in Critical Trauma Patients: Machine Learning-Based Modeling Study. JMIR Form Res. 2023;7:e42452.

      (5) Li-Gao R, Hughes DA, le Cessie S, et al. Assessment of reproducibility and biological variability of fasting and postprandial plasma metabolite concentrations using 1H NMR spectroscopy. PLoS One. 2019;14(6):e0218549.

      (6) Geng T-T, Chen J-X, Lu Q, et al. Nuclear Magnetic Resonance–Based Metabolomics and Risk of CKD. American Journal of Kidney Diseases. 2023.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The current manuscript by Hajra et al deals with the role of the prominent Sirtuins SIRT1 and -3 during infection of macrophages with Salmonella Typhimurium (ST). Apparently, ST infection induces upregulation of host cell SRTs to aid its own metabolism during the intracellular lifestyle and to help reprogramming macrophage polarization. The manuscript has two parts, namely one part that deals with Salmonella infection in cells, where RAW 264.7 murine macrophage-like cells, sharing some features with primary macrophages, were employed. Infected RAW cells displayed a tendency to polarize towards wound-healing M2 and not inflammatory M1 macrophages, which was dependent on SRT. Consequently, the inflammatory response in RAW was more robust in the absence of SRT. Moreover, loss of SRTs leads to impaired bacterial proliferation in these cells, which was attributed to defects in metabolic adaption of the bacteria in the absence of SRT-activity and to the increased M1 inflammatory response.

      Unfortunately, the line of argumentation remains incomplete because corresponding assays in mice showed the opposite result as compared to the experiments using RAW 264.7 cells. i.e. loss of SRTs leads to increased bacterial load in animals (versus impaired proliferation in RAW 264.7 cells). The authors cannot explain this discrepancy.

      Strengths:

      Extensive analysis of Salmonella infection in RAW macrophage-like cells and mice in the context of SRT1/3 function.

      Weaknesses:

      Lack of connection between the cell-based and organismic data, which are not supportive of each other.

      We are highly grateful for your valuable and insightful comments. Thank you for appreciating the merit of our manuscript. We agree with the opposing phenotypes among the RAW264.7 cell line (Fig. 2A), primary peritoneal macrophages (ex vivo) (Fig.2B), and in vivo mouse model (Fig.8) findings. Both RAW264.7 macrophage and peritoneal macrophage infection show attenuated intracellular bacterial proliferation owing to the heightened proinflammatory burst. This is in sharp contrast to our in vivo mouse model of infection which shows increased organ burden and bacterial dissemination. The higher bacterial load in the organs including the spleen (Fig.8B) is attributed to increased pro-inflammatory cytokine burst and ROS production (Fig.8F-H, Fig.S9) triggering bacterial dissemination. The pro-inflammatory arsenals like IL-6, IL-1β and ROS that limit bacterial proliferation within the macrophages (F4/80+ macrophages within the spleen or in RAW264.7 macrophages or primary peritoneal macrophages) are facilitating bacterial dissemination in blood and to the other organs (Fig. 8I-L, Fig.S3F-G). This is in line with the following previous findings-

      Klebsiella pneumoniae infection triggers an inflammatory response via secretion of IL-6 upon HIF-1α activation that induces bacterial dissemination (Holden VI, Breen P, Houle S, Dozois CM, Bachman MA. Klebsiella pneumoniae Siderophores Induce Inflammation, Bacterial Dissemination, and HIF-1α Stabilization during Pneumonia. mBio. 2016 Sep 13;7(5):e01397-16. doi: 10.1128/mBio.01397-16. PMID: 27624128; PMCID: PMC5021805.).

      Correlation analysis of immune responses to Salmonella infection revealed that increased innate immune “cassette” opposes the adaptive immune arm leading to increased bacterial load in mice (Hotson AN, Gopinath S, Nicolau M, Khasanova A, Finck R, Monack D, et al. Coordinate actions of innate immune responses oppose those of the adaptive immune system during Salmonella infection of mice. Science signaling. 2016;9(410):ra4). 

      In our revised manuscript, we have assessed additional splenic populations including CD45+, Ly6C+, and CD11c+ populations. Our results show that the CD45+ splenic population depicts increased bacterial loads like that of the total splenic population within the SIRT1/3 inhibited cohorts. However, CD45+ monocytes and Ly6C positive splenic population exhibit compromised burden within the SIRT1/3 inhibited cohorts. Moreover, within the CD11c+ population, CD45+ granulocytes or lymphocytes show comparable organ loads to that of the vehicle control or SIRT1 activator-treated mice group (Fig. M-S, Fig.S8). Overall, our data suggest heterogeneous bacterial burden in diverse splenic populations.

      Reviewer #2 (Public Review):

      Dipasree Hajra et al demonstrated that Salmonella was able to modulate the expression of Sirtuins (Sirt1 and Sirt3) and regulate the metabolic switch in both host and Salmonella, promoting its pathogenesis. The authors found Salmonella infection induced high levels of Sirt1 and Sirt3 in macrophages, which were skewed toward the M2 phenotype allowing Salmonella to hyper-proliferate. Mechanistically, Sirt1 and Sirt3 regulated the acetylation of HIF-1alpha and PDHA1, therefore mediating Salmonella-induced host metabolic shift in the infected macrophages. Interestingly, Sirt1 and Sirt3-driven host metabolic switch also had an effect on the metabolic profile of Salmonella. Counterintuitively, inhibition of Sirt1/3 led to increased pathogen burdens in an in vivo mouse model. Overall, this is a well-designed study. There are a few comments below that would further strengthen the current study.

      Major comments:

      In the in vivo study (lines 436-446) - the authors noticed increased pathogen burden in the EX-527 or the 3TYP-treated mice cohorts but decreased pathogen burden within the F4/80+ macrophage population. What are the other cell types that have increased pathogen burden in splenocytes from EX-527 or the 3TYP treated? Can this be further explored and explained?

      While the authors indicated that IL-6 cytokine storm and elevated ROS production could result in bacterial dissemination in vivo, one could also argue that Sirt1/3 inhibitors might have an impact on gut function and/or gut microbiota (PMID: 22115311). Did Sirt1/3 inhibitors also lead to increased pathogen burdens in the gut? If so, the potential effect of these in vivo treatments on gut microbiota/colonization resistance should be discussed.

      Minor comment:

      Sirt1 has been shown to be degraded during Salmonella infection (PMID: 28192515), which is different from the current study. An explanation should be provided for this.

      We thank you for your encouraging and gracious comments. We deeply appreciate your time and efforts in providing constructive feedback for the betterment of our work. As per your precious suggestions, we have assessed additional splenic populations including CD45+, Ly6C+, and CD11c+ populations apart from F4/80+ macrophage populations. Our analysis suggests that the CD45+ splenic population show increased bacterial loads similar to the total splenic population within the SIRT1/3 inhibited cohorts. However, CD45+ monocytes and Ly6C positive splenic population exhibit compromised burden within the SIRT1/3 inhibited cohorts. Moreover, CD11c+ population, CD45+ granulocytes or lymphocytes show comparable organ loads to that of the vehicle control or SIRT1 activator treated mice group (Fig. 8M-S). Overall, our data suggest heterogeneous bacterial burden in diverse splenic populations.

      We immensely appreciate the reviewer for this insightful question about the effect of SIRT1/3 on the gut per se. To answer your question, we observed increased pathogen loads within the mesenteric lymph nodes of the gut in the SIRT1/3 inhibitor-treated mice groups (Fig.8B). In our revised manuscript, we evaluated gut inflammation via IL1-β estimation in the mice's ileal tissues and have observed heightened IL-1β production in the inhibitor-treated mice cohorts in comparison to the vehicle control (Fig. S3G). We have also examined gut epithelial pathology via Haematoxylin-Eosin (H&E) staining of the ileal sections to address the effect of in vivo treatment on gut microbiota and colonization resistance which is appended here. However, the gut microbiota crosstalk and their effect on colonization resistance is a part of another current study and it is being examined in detail there. Therefore, this appended H&E has not been incorporated in the revised manuscript.

      Author response image 1.

      In line with the reference PMID: 28192515, where Sirt1 has been shown to be degraded during Salmonella infection at later time points of infection, our study also has shown that both SIRT1 mRNA (Fig. 1A) and protein levels (Fig. S1A) show an elevated expression at 2h and 6h post-infection and show a downregulation at 16h in comparison to the 6h time point.  However, SIRT3 expression levels remain elevated even at later time points of infection. Therefore, we speculate that there is a shared role between SIRT1 and SIRT3 that facilitates the phenotypes reported in our study.

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Hajra et al have attempted to identify the role of Sirt1 and Sirt3 in regulating metabolic reprogramming and macrophage host defense. They have performed gene knockdown experiments in RAW macrophage cell lines to show that depletion of Sirt1 or Sirt3 enhances the ability of macrophages to eliminate Salmonella Typhimurium. However, in mice, inhibition of Sirt1 resulted in dissemination of the bacteria but the bacterial burden was still reduced in macrophages. They suggest that the effect they have observed is due to increased inflammation and ROS production by macrophages. They also try to establish a weak link with metabolism. They present data to show that the switch in metabolism from glycolysis to fatty acid oxidation is regulated by acetylation of Hif1a, and PDHA1.

      Strengths:

      The strength of the manuscript is that the role of Sirtuins in host-pathogen interactions has not been previously explored in-depth making the study interesting. It is also interesting to see that depletion of either Sirt1 or Sirt3 results in a similar outcome.

      Weaknesses:

      The major weakness of the paper is the low quality of data, making it harder to substantiate the claims. Also, there are too many pathways and mechanisms being investigated. It would have been better if the authors had focussed on either Sirt1 or Sirt3 and elucidated how it reprograms metabolism to eventually modulate host response against Salmonella Typhimurium. Experimental evidence is also lacking to prove the proposed mechanisms. For instance, they show correlative data that the knockdown of Sirt1-mediated shift in metabolism is due to HIF1a acetylation but this needs to be proven with further experiments.

      We appreciate the reviewer’s critical analysis of our work. In the revised manuscript, we aimed to eliminate the low-quality data sets and have tried to substantiate them with better and conclusive ones, as directed in the recommendations for the author section. We agree with the reviewer that the inclusion of both Sirtuins 1 and 3 has resulted in too many pathways and mechanisms and focusing on one SIRT and its mechanism of metabolic reprogramming and immune modulation would have been a less complicated alternative approach. However, as rightly pointed out, our work demonstrated the shared and few overlapping roles of the two sirtuins, SIRT1 and SIRT3, together mediating the immune-metabolic switch upon Salmonella infection. As per the reviewer’s suggestion, we have performed additional experiments with HIF-1α inhibitor treatment in our revised manuscript to substantiate our correlative findings on SIRT1-mediated regulation of host glycolysis (Fig.7G).

      Reviewer #1 (Recommendations For The Authors):

      The authors state "SIRT1 and SIRT3 inhibition resulted in increased pathogen loads in organs and triggered enhanced bacterial dissemination, together leading to increased susceptibility of the mice to S. Typhimurium infection owing to increased ROS and IL-6 production." How can this be reconciled? To the reviewer, this is not a convincing explanation. The reviewer is not a mouse pathologist, so maybe did not understand the argument in full.

      However, in order to clarify whether these phenomena can be brought into context and explained by for instance cell-autonomous (in (RAW) macrophages) versus non-autonomous (in mice) mechanisms, it would be required to bring in context the organismic phenotype with a cellular phenotype, using more physiologic primary macrophages.

      (1) The authors show in Figure 8 that in general SRT inhibition leads to increased infection whereas SRT activation results in decreased infection. This is even true for e the spleen (e.g. Figure 8B), which should be full of macrophages upon infection.

      (2) Only Figure 8L implies that endogenous primary, splenic macrophages show a higher infection rate upon pharmacologic SRT activation, which would potentially mirror the RAW results. This is however not supportive of their own explanation: Who would now produce more ROS and IL6 if these macrophages are more supportive of intracellular ST? Is there a difference in the roles or SRTs between different types of macrophages and/or neutrophils? And between macrophages and somatic cells concerning ST infection? The reviewer tends to believe that RAW cells display a defective killing response (such as ROS production) as they are highly transformed cells. Therefore, the authors should use cultured peritoneal macrophages or BMDMs in addition to RAW264.7 cells.

      The literature cited by the authors also implies that the inflammatory response in mice is higher in the absence of SRTs. This is in line with a role for SRTs in (negatively) regulating M1 inflammatory polarization but probably not with increased bacterial burden in mice. If it was, then increased dissemination could be explained by increased tissue damage. However, the flow cytometry experiments from infected organs then do not confirm that, as the infection of individual cells is higher upon SRT inhibition. Thus there seems a broad gap between the role of SRTs in ST infection in RAW264.7 cells versus non-transformed cells.

      I would not discard the RAW results, as I am convinced that they contain valuable data. However, it needs to be clarified what aspect of the host response RAW 264.7 cells represent. Primary macrophages might likely be more aggressive towards the bacteria. Finally, the question arises: what is the role of the metabolic switch in the in vivo setting?

      The reviewer recommends repeating some key experiments by in-vitro-infecting BMDMs or isolated peritoneal macrophages (after some days of culturing) to bridge between the present RAW-derived data and the mouse data. How is the bacterial load with and without SRT inhibitor/activator in primary macrophages, when infected outside of the body? Can ex-vivo infection also affect polarization of e.g. peritoneal macrophages or the metabolic switch? If it is possible to find a conclusive explanation for their data, then this story might really add to our understanding of another aspect of how ST manipulates the host to survive.

      In case the reviewer understands the mouse experiments correctly, all assays on peritoneal cells were performed after in-vivo-infection and/or treatment.

      Together, RAW 264.7 murine macrophage-like cells might not be the right model to understand the phenotypes in full. As far as the reviewer knows, these cells are not capable of killing bacteria as effectively as activated primary macrophages or neutrophils.

      A few of the key findings of RAW264.7 macrophages have been replicated in primary peritoneal macrophages (Fig. 2B, S3E-F, S6B, S7B-D). We wanted to clarify that the peritoneal macrophage experiments were performed ex vivo, wherein peritoneal macrophages were isolated from mice were then subjected to SIRT1/3 inhibitor treatments and Salmonella infection and not after in vivo treatment or infection. In ex vivo setting, we have examined the effect of SIRTs on the metabolic switch during Salmonella infection (Fig. S7B-D) which resembled our RAW264.7 macrophage data. Additionally, in in vivo setting, we have analyzed the transcript level expression of host metabolic genes and corresponding bacterial metabolic genes in infected mice liver and spleen tissue under SIRT1/3 inhibitor treatment (Fig.S7E-F, Fig.6C-D). Our primary peritoneal macrophage data exactly mirrors the RAW264.7 macrophage findings showing attenuated intracellular bacterial proliferation owing to the heightened proinflammatory burst upon SIRT1/3 knockdown or inhibition (Fig.2A-B). This is opposite to our in vivo mouse model of infection which shows increased organ burden and bacterial dissemination (Fig.8A-H). The pro-inflammatory arsenals that limit bacterial proliferation within the macrophages (F4/80+ macrophages within the spleen or in RAW264.7 macrophages or primary peritoneal macrophages) are facilitating bacterial dissemination in blood and to the other organs owing to tissue damage (Fig.8E-L). This is in line with the following previous findings-

      Klebsiella pneumoniae infection triggers an inflammatory response via secretion of IL-6 upon HIF-1α activation that induces bacterial dissemination (Holden VI, Breen P, Houle S, Dozois CM, Bachman MA. Klebsiella pneumoniae Siderophores Induce Inflammation, Bacterial Dissemination, and HIF-1α Stabilization during Pneumonia. mBio. 2016 Sep 13;7(5):e01397-16. doi: 10.1128/mBio.01397-16. PMID: 27624128; PMCID: PMC5021805.).

      Correlation analysis of immune responses to Salmonella infection revealed that increased innate immune “cassette” opposes the adaptive immune arm leading to increased bacterial load in mice (Hotson AN, Gopinath S, Nicolau M, Khasanova A, Finck R, Monack D, et al. Coordinate actions of innate immune responses oppose those of the adaptive immune system during Salmonella infection of mice. Science Signaling. 2016;9(410):ra4). 

      As per the reviewer’s suggestions, we have analyzed other populations apart from F4/80+ macrophages and have observed that the CD45+ splenic population depicts increased bacterial loads like that of the total splenic population within the SIRT1/3 inhibited cohorts. However, CD45+ monocytes and Ly6C positive splenic population exhibit compromised burden within the SIRT1/3 inhibited cohorts. Moreover, the CD1c+ population, CD45+ granulocytes, or lymphocytes show comparable organ loads to that of the vehicle control or SIRT1 activator-treated mice group (Fig.8M-S, Fig.S8). Overall, our data suggest heterogeneous bacterial burden in diverse splenic populations.

      Reviewer #3 (Recommendations For The Authors):

      Abstract

      The authors state that perturbing Sirt1 and Sirt3 results in a shift in Salmonella's metabolism. On the contrary, the data reflects the metabolism in the host cell and not the bacteria. This statement is wrong. They only show increased expression of some of the glycolytic genes in Salmonella, which is not sufficient to make the claim that the switch to fatty acid oxidation in macrophages is due to utilisation of glucose by the bacteria.

      We value the reviewer’s response and have accordingly reframed our sentence in the abstract (Line 24-25).

      Fig 1: Expression of Sirt1 - The data needs to be supported with a western blot for Sirt1 and Sirt3 but the Western blots shown in the supplementary figure are of very poor quality and do not support the authors' claim.

      We have repeated the western blot and have supplemented the previous blot with an alternate blot in Fig. S1A as per your precious input.

      Why haven't the authors shown any representative blots for Sirt1 and Sirt3 upon infection with Salmonella mutants? They need to italicize the genes when they describe mRNA expression.

      Previously we had only performed transcript-level expression of Sirt1 and Sirt3 upon infection with Salmonella mutants and therefore representative blot image was absent. The gene names have been duly italicized while describing mRNA expression (Line 126-154). We regret the inconvenience caused. We have performed the western blotting to assess the protein expression profile upon infection with Salmonella mutants as per the reviewer’s suggestion and the representative blot image has been duly appended in the revised manuscript (Fig. S1B).

      What is the rationale for examining Sirt1 and Sirt3 mRNA in M1 and M2 macrophages? Salmonella infection on its own will polarise the macrophages towards M1. How long were these macrophages infected? The time points are missing.

      The rationale behind the examination of Sirt1 and Sirt3 mRNA in M1 and M2 polarized was to ascertain whether indeed M1 polarized macrophages exhibit decreased expression of Sirt1 or Sirt3 and polarization of macrophages toward M2 state show upregulation of Sirt1 and Sirt3 upon Salmonella infection. After confirming these above-mentioned findings through this preliminary experiment, we then hypothesized whether Salmonella infection on its own will polarise the macrophages toward an immunosuppressive M2 state at a later time course of infection as infection drives the induction of SIRT expression and whether this is mediated by Sirt1 and Sirt3 (Fig. 3). We are extremely apologetic for not mentioning the 16h time-point in the figure and the missing time point has been duly documented in the revised manuscript (Line 155).

      Fig S2 knockdown of Sirt1 and Sirt3 are not convincing.

      We are extremely sorry for the inconclusive knockdown blot. An alternative blot has been substantiated in the revised manuscript (Fig. S2,C-D).

      Fig 2A and 2B the time point post infection has not been mentioned. Although it is stated that 2h and 16h post-infection samples were analysed. Only one time point has been shown.

      We are sorry for the confusion. We wanted to clarify that Fig.2A and Fig. 2B show the fold proliferation where fold proliferation was calculated as CFU at 16hr divided by CFU at 2hr as mentioned in the materials and methods section under the heading of Intracellular proliferation or gentamicin protection assay.

      Fold Proliferation= [CFU at 16h]/[CFU at 2h]

      The cytokines data are intriguing in that the increase in IL-6 relative to control is seen only at 2h and 20h but not at 6h. Il-6 at 20h in untransfected cells is comparable to uninfected cells. Did the authors investigate cell death? Salmonella induces various forms of cell death which could account for the decreased cytokine production at later time points.

      We have investigated the cell death upon Salmonella infection via MTT assay. At later time points of infection, we indeed observed around 16 percent decrease in cell survival compared to the initial time point of 2h. The results have been appended here and it supports our eminent reviewer’s reasoning for the decreased cytokine production at later time points.

      Author response image 2.

      Additional cytokines such as IL-1b would be helpful. Also, not sure how uninfected macrophages produce nearly 200pg of IL-10.

      As per the author’s critical suggestion, we have assessed the IL-1b cytokine production at 16h post-infection in RAW264.7 macrophages and peritoneal macrophages and mice serum samples at 5th day post-infection (Fig.S3C, S3E-F). Our results indicate increased production of IL-b in the infected SIRT1/3 knockdown RAW264.7 macrophages, SIRT1/3 inhibitor-treated peritoneal macrophages and in mice serum samples under SIRT1/3 inhibitor treatment in comparison to the vehicle control. Additionally, we have quantified IL-1b in mice ileal tissues under SIRT1/3 inhibitor treatment (Fig.S3G) and have obtained heightened intestinal IL-1b production in the inhibitor-treated cohorts. We thank the reviewer for raising the concern for 200pg of IL-10 in the uninfected macrophages. We have repeated the experiment and have provided an alternative representative graph for the experiment wherein the IL-10 levels in the uninfected cohorts range between 20-40pg/ml (Fig. S3B).

      It is surprising that the authors have found increased Sirt1 binding to NFkB, however there is no change in acetylated NFkB upon infection (Fig 4B). Acetylated p65 is equally high in uninfected Scrambled siRNA, UI shSirt1, STM Scr, and STM shSirt1. Furthermore, increased binding of Sirt1 with NFkb would mean decreased acetylation hence decreased inflammation. However, Salmonella induces profound inflammation.

      We thank the reviewers for their insightful and critical questioning. We truly acknowledge that due to oversaturation there was no apparent change in the acetylated p65 among the different sample sets. Therefore, in the revised manuscript we have provided an image at lower exposure where the changes in the acetylation of the p65 subunit are apparent. Salmonella induces inflammation upon challenge similar to any other pathogens and induces acute inflammatory responses. This heightened acute inflammation at the initial phases of infection subsides at a later phase of infection. Here, we have performed the Sirt1 interaction with NFκB at 16hr post-infection where increased binding of Sirt1 with NFκB facilitates the resolution of the Salmonella-_induced acute inflammation. This is in line with previous reports that suggest SIRT1 suppresses acute inflammation through the promotion of p65 acetylation and inhibition of NFκB activity. (Yang H, Zhang W, Pan H, et al. SIRT1 activators suppress inflammatory responses through promotion of p65 deacetylation and inhibition of NF-κB activity. _PLoS One. 2012;7(9):e46364. doi:10.1371/journal.pone.0046364, Liu TF, Yoza BK, El Gazzar M, Vachharajani VT, McCall CE. NAD+-dependent SIRT1 deacetylase participates in epigenetic reprogramming during endotoxin tolerance. J Biol Chem. 2011;286(11):9856–64., Liu TF, Vachharajani V, Millet P, Bharadwaj MS, Molina AJ, McCall CE. Sequential actions of SIRT1-RELB-SIRT3 coordinate nuclear-mitochondrial communication during immunometabolic adaptation to acute inflammation and sepsis. J Biol Chem. 2015;290(1):396–408.)

      Please explain how the acetylated p65 was analysed.

      Total endogenous p65 subunit was immunoprecipitated using Anti-NFκB p65 antibody and the immunoprecipitated fraction was probed with Anti-Acetylated Lysine antibody to assess acetylated p65.

      An increase in ROS production is seen in a relatively small percentage of cells- not more than 4% of cells. How does this contribute to such a significant difference in intracellular bacterial burden? Also, it is not clear how the authors calculated the fold change in proliferation. It is better to show the actual bacterial burden logarithmically.

      We strongly agree with the reviewer’s concerns, and we have reanalyzed the flow cytometric data set. The revised data have been presented in Fig. S5 which shows a considerable increase in DCFDA positive population. For instance, the infected scrambled control shows around 2.44% of ROS-producing cells, however knockdown of SIRT1 and SIRT3 increases the ROS-producing cells to 27.34% and 28.64% respectively.

      Fold proliferation was calculated as CFU at 16hr divided by CFU at 2hr as mentioned in the materials and methods section under the heading of Intracellular proliferation or gentamicin protection assay. Fold proliferation has been calculated as opposed to absolute CFU values to nullify the differential phagocytosis of bacteria to the macrophages among the samples.

      Fold Proliferation= [CFU at 16h]/[CFU at 2h]

      An increase in metabolic genes is not sufficient to show that the macrophages are metabolically reprogrammed.

      We thank the reviewer for the valuable comment. We agree that an increase in metabolic gene profile is not sufficient to claim metabolic reprogramming. Therefore, in addition to the metabolic gene profile, we have estimated lactate production (end-product of glycolysis) as an indicator of glycolysis (Fig. 5 C-E) and have performed the fatty acid β oxidation activity (Fig. 5G-H) to support our claims.

      Figure 5F the band intensities do not visually match the bands shown for PFK. For instance, shSIRT1 STM (1.00) and shSIRT3 STM (0.81).

      We are extremely sorry for the erroneous band intensity for shSIRT3. Upon reanalysis of the band intensities, we have corrected the band intensity for shSIRT3 to 2.28 (Fig.5F).

      It is surprising that HADHA is not expressed in uninfected samples.

      We are extremely apologetic for the inappropriate representative blot. We feel that the discrepancy might have arisen due to the usage of old antibodies. We have provided an alternate blot for the HADHA gene where fresh antibody staining solution was used for probing which shows expression even in the uninfected samples (Fig.5F).

      Figure 6A - What is the significance of PFA fixed samples (PI) compared to SI samples? This has not been discussed.

      PFA-fixed samples are paraformaldehyde-treated bacterial samples that harbor the immune signals or Pattern Associated Molecular Patterns (PAMPs). The rationale for using PI in addition to SI samples was to show whether the phenomena is driven by live metabolically active pathogens or is mediated by PAMPs.

      I understand that the hypothesis is that during the later phase of infection, there is an increase in fatty acid oxidation which correlates with a decrease in inflammation. However, at 6h there is no increase in genes regulating fatty acid oxidation. Why did the authors choose 6h when the previous experiments have been done at 16h?

      We indeed agree with the reviewer’s understanding of our hypothesis that there is an increase in fatty acid oxidation along the progression of infection which correlates with a decrease in inflammation. The Salmonella intracellular replication has been reported to commence at 6h post-internalization when SPI-2 effector expression is fully established (Helaine S, Thompson JA, Watson KG, Liu M, Boyle C, Holden DW. Dynamics of intracellular bacterial replication at the single cell level. Proc Natl Acad Sci U S A. 2010;107(8):3746-3751. doi:10.1073/pnas.1000041107). Therefore, we have assessed the 6h timepoint post-infection in addition to the initial and later timepoints of 2h and 16h respectively. Additionally, the nanostring gene profiling data of both host and bacterial genes indicate the onset of both metabolic (Fig. 5A, 6A) and immune genes (Fig. 3A) modulation at 6h post-infection. We have validated these results via qPCR studies and have observed an upregulation in the transcript level of fatty acid oxidation genes as depicted in Fig. S7A in RAW264.7 macrophages.

      Line 355 it is mentioned that Sirt1 and Sirt3 abrogate metabolic shift by reducing glycolytic flux. This is incorrect as experiments such as carbon chase assays have not been performed to investigate glycolytic flux.

      As per the reviewer’s valuable suggestion, we have removed the word ‘flux’ from the above-mentioned statement(Line 351, Line 353).

      Lines 392-393: "We immunoprecipitated PDHA1 and checked for its interaction with SIRT3 or SIRT1 under knockdown condition of SIRT3 or upon SIRT3 inhibitor treatment (Fig.7 G-H)"

      What is the rationale for checking PDHA1 interaction with Sirt under Sirt knockdown conditions?

      We are thankful to the reviewer for the critical comments. The rationale for checking PDHA1 interaction with Sirt was to ascertain that indeed Sirt interacted with PDHA1 under S. Typhimurium infection and abrogation of either protein expression (knockdown) or their enzymatic activity (inhibitor treatment) diminished the interaction.

      Moreover, the blots are very confusing and do not represent the authors' claims.

      (1) In the input blot I do not see Sirt3 depletion in shSirt3 knockdown sample.

      The knockdown has been quantified in the input blot as per your suggestion. A knockdown of 40% has been obtained in the uninfected dataset whereas a knockdown of 47.1% has been obtained in the infected data set at 16h post-infection (Fig.7H).

      (2) Why does Sirt1 interact with PDHA1 similar to Sirt3. Do both the proteins bind to PDHA1 at the same time/ competitively? If so do they both deacetylate?

      In literature, Sirt3 has been shown to interact with PDHA1 and deacetylate PDHA1. However, the interaction of Sirt1 with PDHA1 has not been reported previously and therefore we are unable to comment on the exact dynamics of the interaction. Future studies need to be performed to explore these phenomena in depth. However, SIRT1 agonist SRT1720 has been shown to impact PDH phosphorylation and its activity (Han Y, Sun W, Ren D, Zhang J, He Z, Fedorova J, Sun X, Han F, Li J. SIRT1 agonism modulates cardiac NLRP3 inflammasome through pyruvate dehydrogenase during ischemia and reperfusion. Redox Biol. 2020 Jul;34:101538).

      (3) Figure 7I in the IP: IgG samples Sirt3 seem to bind to IgG non-specifically, which questions the specificity of Sirt3 binding to PDHA1.

      We appreciate the reviewer for pointing out this concern. The immunoprecipitation experiment has been repeated and the same has been appended in the revised manuscript and we observe no non-specific binding of Sirt3 antibody to IgG.

      (4) In Figure 7I all the bands Ac PDHA1, PDHA1, and Sirt3 look similar with double bands, which has not been seen in other blots. How is this possible?

      This cannot explain the increase in beta-oxidation observed.

      We thank the reviewer for raising this concern. We have repeated the experiment and provided the alternative blot as per the reviewer’s suggestion.

      The rationale for performing this experiment was to show that SIRT plays an important role in the activation of downstream TCA cycle pathways via PDHA1 deacetylation during Salmonella infection. The deacetylation of PDHA1 has been previously reported to cause transcriptional activation of the downstream TCA cycle and oxidative phosphorylation (Zhang Y, Wen P, Luo J, et al., Cell Death Dis.,2021). Additionally, PDHA1 hyperacetylation has been reported to cause lactate overproduction (An, S., Yao, Y., Hu, H. et al. PDHA1 hyperacetylation-mediated lactate overproduction promotes sepsis-induced acute kidney injury via Fis1 lactylation. Cell Death Dis 14, 457 (2023)). In our study, increased lactate production and PDHA1 hyperacetylation have been observed during SIRT3 inhibition conditions upon Salmonella infection.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors investigate the impact of fecal microbiota transfer (FMT) on intestinal recovery from enterotoxigenic E. coli infection following antibiotic treatment. Using a piglet model of intestinal infection, the authors demonstrate that FMT reduces weight loss and diarrhea and enhances the expression of tight junction proteins. Sequencing analysis of the intestinal microbiota following FMT showed significant increases in Akkermansia muciniphila and Bacteroides fragilis. Using additional mouse and organoid models, the authors examine the impact of these microbes on intestinal recovery and modulation of the Wnt signaling pathway. Overall, the data support the notion that FMT following ETEC infection is beneficial, however, additional investigation is required to fully elucidate the mechanisms involved.

      Strengths:

      Initial experiments used a piglet model of infection to test the value of FMT on recovery from E. coli. The FMT treatment was beneficial and the authors provide solid evidence that the treatment increased the diversity of the microbiota and enhanced the recovery of the intestinal epithelium. Sequencing data highlighted an increase in Akkermansia muciniphila and Bacteroides fragilis after FMT.

      The mouse data are consistent with the observations in pigs, and reveal that daily gavage with A. muciniphila or B. fragilis enhances intestinal recovery based on histological analysis, expression of tight junction proteins, and analysis of intestinal barrier function.

      The authors demonstrate the benefit of probiotic treatment following infection using a range of model systems.

      Weaknesses:

      Without sequencing the pre-infection pig microbiota or the FMT input material itself, it's challenging to firmly say that the observed bloom in Akkermansia muciniphila and Bacteroides fragilis stemmed from the FMT.

      Response: We have determined the relative abundance of each bacterium in fecal bacterial suspension, referring to Hu et al. (2018). The absolute abundances of Akkermansia muciniphila and Bacteroides fragilis in the FMT were 1.3 × 103 ± 2.6 × 103 and 4.5 × 103 ± 6.1 × 103 respectively.

      Reference:

      Hu LS, Geng SJ, Li Y, et al. Exogenous Fecal Microbiota Transplantation from Local Adult Pigs to Crossbred Newborn Piglets. Front. Microbiol. 2018, 8.

      The lack of details for the murine infection model, such as weight loss and quantification of bacterial loads over time, make it challenging for a reader to fully appreciate how treatment with Akkermansia muciniphila and Bacteroides fragilis is altering the course of infection. Bacterial loads of E. coli were only quantified at one time point, and the mice that received A. muciniphila and B. fragilis had very low levels of E. coli. Therefore, it is not clear if all mice were subjected to the same level of infection in the first place. The reduced translocation of E. coli to the organs and enhanced barrier function may just reflect the low level of infection in these mice. Further, the authors' conclusion that the effect is specific to A. muciniphila or B. fragilis would be more convincing if the experiments included an inert control bacterium, to demonstrate that gavage with any commensal microbe would not elicit a similar effect.

      The weight loss was added in Figure S2A. All mice were subjected to the same level of infection in the first place.

      Many of the conclusions in the study are drawn from the microscopy results. However, the methods describing both light microscopy and electron microscopy lack sufficient detail. For example, it is not clear how many sections and fields of view were imaged or how the SEM samples were prepared and dehydrated. The mucus layer does not appear to be well preserved, which would make it challenging to accurately measure the thickness of the mucus layer.

      For light microscopy, 3-4 fields were selected from each mouse to count about 30 crypts. The method of electron microscopy was complemented on line 263-270. We have removed data of the mucus layer.

      Gene expression data appears to vary across the different models, for example, Wnt3 expression in mice versus organoids. Additional experiments may be required to clarify the mechanisms involved. Considering that both of the bacteria tested elicited similar changes in Wnt signaling, this pathway might be broadly modulated by the microbiota.

      The reason why the Wnt3 expression pattern is different in mice and in porcine intestinal organoids may be caused by the different infection periods of ETEC in vivo and in vitro. Furthermore, in vivo, the stem cell niche of intestinal stem cells is not only regulated by intestinal epithelial cells, but also affected by mesenchymal cells in connective tissues (Luo et al., 2022). However, in vitro models, stem cell niche is only regulated by epithelial secretory factors, which may also account for the differences in in vitro and in vivo results.

      It has been reported that B. fragilis pretreatment significantly increased the relative abundance of A. muciniphila in the intestine of CDI mice, and the growth and maintenance of A. muciniphila were involved in the restoration of intestinal barrier integrity after CDI infection, indicating that there might exist a bacterial metabolic symbiosis between A. muciniphila and B. fragilis (Deng et al., 2018).

      References:

      Luo HM, Li MX, Wang F, et al. The role of intestinal stem cell within gut homeostasis: Focusing on its interplay with gut microbiota and the regulating pathways. Int. J. Biol. Sci. 2022, 18(13): 5185-5206.

      Deng HM, Yang SQ, Zhang YC, et al. Bacteroides fragilis Prevents Clostridium difficile Infection in a Mouse Model by Restoring Gut Barrier and Microbiome Regulation. Front. Microbiol. 2018, 9.

      The unconventional choice to not include references in the results section makes it challenging for the reader to put the results in context with what is known in the field. Similarly, there is a lack of discussion acknowledging that B. fragilis is a potential pathogen, associated with intestinal inflammation and cancer (Haghi et al. BMC Cancer 19, 879 (2019) ), and how this would impact its utility as a potential probiotic.

      Bacteroides fragilis is one of the symbiotic anaerobes within the mammalian gut and is also an opportunistic pathogen which often isolated from clinical specimens. Bacteroides fragilis was first isolated from the pathogenic site and considered to be pathogenic bacteria. However, with the deepening of research, it is gradually realized that in the long-term evolution process, Bacteroides fragilis colonized in the gut has established a friendly relationship with the host, which is an essential component for maintaining the health of the host, especially for obesity, diabetes and immune deficiency diseases. We have supplemented the discussion on line 598-603.

      Reviewer #2 (Public Review):

      Ma X. et al proposed that A. muciniphila was a key strain that promotes the proliferation and differentiation of intestinal stem cells by acting on the Wnt/β-catenin signaling pathway. They used various models, such as the piglet model, mouse model, and intestinal organoids to address how A. muciniphila and B. fragilis offer protection against ETEC infection. They showed that FMT with fecal samples, A. muciniphila or B. fragilis protected piglets and/or mice from ETEC infection, and this protection is manifested as reduced intestinal inflammation/bacterial colonization, increased tight junction/Muc2 proteins, as well as proper Treg/Th17 cells. Additionally, they demonstrated that A. muciniphila protected basal-out and/or apical-out intestinal organoids against ETEC infection via Wnt signaling. While a large body of work has been performed in this study, there are quite a few questions to be addressed.

      Major comments:

      - The similar protective effect of FMT with fecal samples, A. muciniphila or B. fragilis is perhaps not that surprising, considering that FMT likely restores microbiota-mediated colonization resistance against ETEC infection. While FMT with fecal samples increases SCFAs, it is unclear whether/how FMT with A. muciniphila or B. fragilis alter the microbiota composition/abundance as well as metabolites in the current models in a way that offers protection.

      We examined changes in the gut microbiota of mice treated with A. muciniphila and B. fragilis through 16s rRNA, and results showed that both A. muciniphila and B. fragilis improved the alpha and beta diversities of the microbiota, while these results were not included in this manuscript.

      - Does ETEC infection in piglets/mice cause histological damage in the intestines? These data should be shown.

      The results of scanning electron microscopy (Figure 3A) showed the intestinal damage of piglets after ETEC infection. H&E staining and transmission electron microscopy (Figure 5A and 5B) showed the intestinal damage of mice after ETEC infection.

      - Line 447, "ETEC adheres to intestinal epithelial cells". However, there is no data showing the adherence (or invasion) of ETEC to intestinal epithelial cells, irrespective of piglets/mouse/organoids.

      The scanning electron microscope (Figure 3A bottom) showed that ETEC K88 infected piglets existed obvious rod-shaped bacterial adhesion on the surface of microvilli. Figure 2C showed the colonization of ETEC K88 in the jejunum and colon of piglets. Figure S2A showed the E. coli colonization in intestines and other tissues of mice.

      - In both basal-out and apical-out intestinal organoid models, A. muciniphila protects organoids against ETEC infection. Did ETEC enter into intestinal epithelial cells at all after only one hour of infection? Is the protection through certain A. muciniphila metabolites?

      It has been reported that the duration of the co-culture for studying the host-microbiota cross-talk by apical-out organoids model is 1 hour (Poletti et al., 2021). In addition, Co et al. (2019) used apical-out organoids model to study host-pathogen interactions, with Salmonella enterica serovar Typhimurium or Listeria monocytogenes invading organoids for an hour.

      References:

      Poletti M, Arnauts K, Ferrante M, et al. Organoid-based Models to Study the Role of Host-microbiota Interactions in IBD. J. Crohns Colitis. 2021, 15(7): 1222-1235.

      Co JY, Margalef-Catala M, Li XN, et al. Controlling Epithelial Polarity: A Human Enteroid Model for Host-Pathogen Interactions. Cell Reports. 2019, 26(9): 2509-2520.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ma et al. describes a multi-model (pig, mouse, organoid) investigation into how fecal transplants protect against E. coli infection. The authors identify A. muciniphila and B. fragilis as two important strains and characterize how these organisms impact the epithelium by modulating host signaling pathways, namely the Wnt pathway in lgr5 intestinal stem cells.

      Strengths:

      The strengths of this manuscript include the use of multiple model systems and follow-up mechanistic investigations to understand how A. muciniphila and B. fragilis interacted with the host to impact epithelial physiology.

      Weaknesses:

      The major weakness is that, as presented, the manuscript is quite difficult to follow, even for someone familiar with the field. The lack of detail in figure legends, organization of the text, and frequent use of non-intuitive abbreviated group names without a clear key (ex. EP/EF, or C E A B) make comprehension challenging. The results section is perhaps too succinct and does not provide sufficient information to understand experimental design and interpretation without reading the methods section first or skipping to the discussion (as an example: WNT-c59 treatment). Extensive revisions could be encouraged to aid in communicating the potentially exciting findings.

      The abbreviations of experimental groups are firstly defined in the Methods and Materials, and we have supplemented the experimental design in the results section on line 397-399, 439-442 and 516-520.

      The bioinformatics section of the methods requires revision and may indicate issues in the pipeline. Merging the forward and reverse reads may represent a problem for denoising. Also since these were sequenced on a NovaSeq, the error learning would have to be modified or the diversity estimates would be inappropriately multiplied. "Alpha diversity and beta diversity were calculated by normalized to the same sequence randomly." Not sure what this means, does this mean subsampled? "Blast was used for sequence alignment", does this mean the taxonomic alignment? This would need to be elaborated on and database versions should be included. The methods, including if any form of multiple testing was included, for LEFSE was also not included.

      Denoising was conducted using UNOISE3 to correct for sequencing errors. Subsequent analysis of alpha diversity and beta diversity were all performed based on the output normalized data. Multiple sequence alignment was performed using MUSCLE (v3.8.31) software to obtain the phylogenetic relationships of all OTUs sequences. We have supplemented the method of multiple testing on line 323-328.

      Reviewer #1 (Recommendations For The Authors):

      At some points, the rationale for using both porcine and murine models was unclear, and it would be helpful for the reader to elaborate on the benefits of these models and why they were used in the introduction. Similarly, it would be helpful to describe the benefits of basal-in organoids versus injecting standard organoids with bacteria.

      The main subject of this study was piglets, supplemented by a mouse model for validation. Interpretation of measurements from organoid microinjection experiments must account for multiple confounding variables such as heterogeneous exposure concentrations and durations, as well as impacts of disrupting the organoid wall. We have added the description in the introduction on line 88-90.

      Line 165 -- The number of piglets used seems high, is it correct approximately 100 pigs were used?

      Nine litters were selected for processing, while only 18 piglets were finally slaughtered.

      There is very little discussion of the preliminary experiment that the authors used to determine how much bacteria to use. I recommend either discussing the data and how the doses were chosen or omitting it. It was not clear if the authors used pasteurized or live bacteria in the experiments. It would also be interesting to include a discussion of the observation that relatively low levels of Akkermansia (10^6 CFU) appeared more beneficial than the higher doses, typically used in these types of experiments.

      We removed these results. The experiments used live bacteria.

      Microscopy methods for both light microscopy and EM would be stronger with added details including how many sections and fields of view were imaged and how the numbers of goblet cells normalized across samples. Without having a clear cross-section of a crypt, it is not clear to me how the images can be used to accurately quantify the number of cells per crypt. Additional details in the methods on how many total crypts were counted should also be included.

      For light microscopy, 3-4 fields were selected from each mouse to count about 30 crypts. We have removed the data of the mucus layer and goblet cells.

      Line 236 -- missing which gene was used.

      The Genbank Accession was added on line 232-233.

      Line 310 -- OTU nomenclature.

      We have supplemented the OTU nomenclature on line 314.

      Line 413 -- This line seems inconsistent with the data analysis described in the methods section. The authors may need to expand their description of the 16S data analysis to be clear and reproducible.

      We have redescribed the 16S data analysis on line 312-328.

      Line 413 -- it is not surprising that 16s analysis did not capture species, it will have limited resolution beyond the genus level.

      We deleted this sentence.

      Methods are missing some details on the data analysis, eg. methods/programs and statistical analysis of PCoA and NMDS, LefSe.

      The methods and statistical analysis of PCoA, NMDS and LEfSe were supplemented on line 323-328.

      Fig 4C -- The images do not clearly capture the mucus layer or how it was analyzed. The sections appear to be cut at a slight angle, with multiple partial sections of crypts. I think this might make it challenging to count goblet cells, especially if the counts are normalized over the number of crypts or villi. The mucus layer does not appear well preserved. For example, I would expect to see an intact mucus layer lining the colon in the PBS control group. Re-cutting sections with a clean cross-section through the tissue will make data analysis easier.

      We have removed data of the mucus layer.

      Fig 4D -- The images appear to be of the mouse proximal colon, whereas the mucus layer and most muc2 will be in the distal colon. If the authors have tissue sections of the distal colon, this may give a clearer image of the mucus layer and might be more consistent with the TEM images in Fig. 4B.

      We apologize for the absence of the distal colon sections.

      To fully preserve the mucus layer, in addition to fixing in Carnoy's solution, the embedding process must be run without the standard washes in 70% ethanol (see: Johansson and Hansson. Methods Mol Biol. (2012) 229; doi: 10.1007/978-1-61779-513-8_13). The mucus will wash away during standard paraffin embedding if the tissue is washed with 70% ethanol, and I wonder if that has occurred in these samples.

      The tissue wasn’t washed with 70% ethanol.

      Fig 6A and 6B -- Although the legend indicates that the data is representative of two independent experiments, it is not clear how many fields of view or cells were imaged. In the bar graphs, it is not clear how many crypts were analyzed and from how many fields of view.

      3-4 fields were selected from each mouse to count about 30 crypts.

      **For all of the bar graphs, this could be addressed by displaying all of the data points, rather than just the mean, to give the reader a sense of how many cells were counted. (as was done in Fig 7B).

      We have changed the bar graphs with data points.

      498-501 -- The text says that the gene expression patterns in the organoids are consistent with the in vivo data, but the data patterns of gene expression appear to be different. For example, patterns for Wnt3 and B-catenin expression in mice, appear to be the opposite of what was observed in the organoid?

      Lines 509-512 mean that the expression patterns of mice in organoids and in vivo is consistent. Figure 7C was incorrectly written as Figure 8C, we have changed it.

      Since Akkermansia does not grow under aerobic conditions, it should be made clear that the organoid co-culture treatment does not involve actively growing bacterial cultures.

      Reunanen et al. found that Akkermansia can tolerate oxygen, more than 90% Akkermansia can keep for 1 h under oxic, 5% CO2 conditions.

      Reference:

      Reunanen J, Kainulainen V, Huuskonen L, et al. Akkermansia muciniphila Adheres to Enterocytes and Strengthens the Integrity of the Epithelial Cell Layer. Appl. Environ. Microbiol. 2015, 81(11): 3655-3662.

      Minor points

      Line 50 -"evidence".

      We have changed to “evidence” on line 49.

      Line 64, 422 - italicize, check italics throughout.

      We have checked italics throughout the manuscript.

      Line 64 - may need to be reworded.

      We have changed to “Clostridioides difficile” on line 66.

      Line 77 - pathogen.

      We have changed to “pathogen” on line 77.

      Line 161 - the.

      We have removed “the” on line 161.

      Line 178 - mouse.

      We have changed to “mouse” on line 179.

      Line 313 -- wording is confusing.

      We have changed the description on line 319-320.

      Line 318 -- Silva version #.

      The version is Silva 132. We have added it on line 316.

      Line 334 - Manufacturer for Live/Dead cell stain?

      The Live/Dead cell stain was used BD Biosciences FVS510. We have added it on line 345.

      Line 433 -- FD4 not defined until here.

      We have refined the FD4 on line 218-219.

      Line 512 -- but did not promote.

      We have changed to “but did not promote” on line 526.

      Line 517 -- Looks like this should be "basal-in organoids" instead of basal-out?

      We have changed the "basal-out" to "apical-to" on line 531.

      Line 546 -- induced neonatal should be protected?

      They are in separate pens.

      Jumps from Fig 7B to Fig 8C in the text.

      We apologize for the wrong writing, and we have change it.

      Reviewer #2 (Recommendations for The Authors):

      The title itself is a bit misleading. Please consider changing it. The authors meant that A. muciniphila prevents pathogen invasion, but does not function in pathogen invasion.

      We have changed the title.

      Major comments:

      - Figures 4A, 4D, and 6B should include presentation of cross-section pictures.

      We provided cross-section pictures to the journal.

      - Figures 7, 8, and 9 should indicate clearly whether mouse or piglet organoids are used. For instance, in the main text, line 490, it indicates piglet organoids, but in Figure 7A legend, it indicates mouse tissue.

      We apologize for the misspelling, and have changed to “mice” on line 501-502.

      - In Figure 7A, the 3rd row, 2nd panel, crypts formed into spherical organoids; whereas in Figure 8, ETEC infection of basal-out organoids formed budding organoids. This needs to be better explained.

      Mouse intestinal organoids were cultured ex vivo from crypts isolated from mice infected with ETEC, while porcine intestinal organoids were co-cultured with ETEC in vitro.

      Minor comments:

      - In the result section, the numbering of Figures or supplementary Figures is problematic, i.e it should start with Figure 1..., Figure S1, but not directly go to Figure S2A etc.

      The Figure 1 was in Materials and Methods.

      - Line 458, please add the gating strategy used in the flow cytometry study.

      The gating strategy was added on line 351-356.

      - The effect of A. muciniphila on the proliferation of intestinal epithelium through the Wnt/β-catenin signaling pathway is well known (such as PMID: 32138776). The authors should discuss this in detail.

      We have supplemented the discussion on line 637-639.

      Reviewer #3 (Recommendations For The Authors):

      It is somewhat unusual that the results from the piglets are in the supplement as this is a major strength of the manuscript (Fig S2).

      We have put these results into Figure 2 of the manuscript.

      "Collectively, our results may provide theoretical basis that FMT is a promising mitigation method for pathogenic bacteria infection and a new strategy for precise application of FMT in clinical and livestock production"- This is somewhat of an odd statement as the introduction of the manuscript completely skips over most of what is known about FMTs in the context of C. difficile. Also if anything, does the authors' own data not point mostly at using A. muciniphila on its own? Clinical trials are well underway in humans.

      We have changed the sentences to “Collectively, our results may provide theoretical basis that A. muciniphila is a promising method to repair intestinal barrier damage and a new strategy for the precise application of A. muciniphila in livestock production.” on line 98-100.

      Line 26: I am not sure probiotic is the right word here given its strict scientific definition. Perhaps beneficial or protective would be more appropriate.

      We have changed “probiotic” to “beneficial” on line 25.

      Line 27: I believe AIMD is antibiotic-induced microbiome-depletion in most usages which may be more accurate and informative than dysregulated.

      The type, dosing, and time of antibiotic we used were applied to induce microbiota disorder.

      It would appear that there are issues in the reference formatting where a number of journal names are missing.

      We have re-edited the reference formatting.

      Line 64- I believe eLife requires the standard practice of italicizing genus and species names. Also Clostridium difficile should now be referred to as Clostridioides difficile.

      We have changed to “Clostridioides difficile” and italicized it on line 66 and 569. The italicizing genus and species names were checked throughout the manuscript.

      Figure S2C: is it not clear why the melt curve was included here, but the legend should make it more clear what is being shown. I assume this is to provide evidence of specificity?

      The melting curve was used to demonstrate that only the ETEC K88 could be amplified by the primers we used. We have added an illustration in the figure legend.

      Figure 2D: there should be a quantitative analysis done on the staining of Muc2.

      We have quantified the staining of MUC2 in Figure 3D.

      Figure 3: The legends are not sufficient. For example: it is not clear what Figure 3A actually shows as the y-axis is not labelled and it is not clear what the relationship is between this and the anosim which is a function for permanova.

      Anosim analysis was performed using the R software with anosim package function based on the rank order of Bray-Curtis distance values to test the significance of differences between groups. The y-axis is the rank of the distance between samples.

      Line 416- OTU not OUT.

      We have changed to “OTU” on line 428.

      Figure 4- the naming key needs to be included in the figure legend. C, E, A, and B are immediately obvious.

      The naming key was included in the figure legend.

      Methods: additional information on the flow cytometry gating strategy/controls should be included.

      The gating strategy was added on line 351-356.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      Recent studies have used optical or electrophysiological techniques to chronically measure receptive field properties of sensory cortical neurons over long time periods, i.e. days to weeks, to ask whether sensory receptive fields are stable properties. Akritas et al expand on prior studies by investigating whether nonlinear contextual sensitivity, a property not previously investigated in the context of so-called 'representational drift,' remains stable over days or weeks of recording. They performed chronic tetrode recordings of auditory cortical neurons over at least five recording days while also performing daily measurements of both the linear spectro-temporal receptive field (principal receptive field, PRF) and non-linear 'contextual gain field' (CGF), which captures the neuron's sensitivity to acoustic context. They found that spike waveforms could be reliably matched even when recorded weeks apart. In well-matched units, by comparing the correlation between tuning within one day's session to sessions across days, both PRFs and CGFs showed remarkable stability over time. This was the case even when recordings were performed over weeks. Meanwhile, behavioral and brain state, measured with locomotion and pupil diameter, respectively, resulted in small but significant shifts in the ability of the PRF/CGF model to predict fluctuations in the neuronal response over time.

      Strengths:

      The study addresses a fundamental question, which is whether the neural underpinnings of sensory perception, which encompasses both sensory events and their context, are stable across relevant timescales over which our experiences must be stable, despite biological turnover. Although two-photon calcium imaging is ideal for identifying neurons stably regardless of their activity levels and tuning, it lacks temporal precision and is therefore limited in its ability to capture the complexity of sensory responses. Akritas et al performed painstaking chronic extracellular recordings in the auditory cortex with the temporal resolution to investigate complex receptive field properties, such as neural sensitivities to acoustic context. Prior studies, particularly in the auditory cortex, focused on basic tuning properties or sensory responsivity, but Akritas et al expand on this work by showing that even the nonlinear, contextual elements of sensory neurons' responses can remain stable, providing a mechanism for the stability of our complex perception. This work is both novel and broadly applicable to those investigating cortical stability across sensory modalities.

      Weaknesses:

      Apart from some aspects such as single-unit versus multi-unit, the study largely treats their dataset as a monolith rather than showing how factors such as firing rate, depth, and cell type could define more or less stable subpopulations. It is likely that their methodology did not enable an even sampling over these qualities, and the authors should discuss these biases to put their findings more in context with related studies.

      We did, in fact, investigate whether firing rate and other physiological response properties of units might differentiate subpopulations with different stability. This analysis is shown in Figure 7B-D. There was no apparent relationship between stability of nonlinear contextual gain fields and physiological properties such as mean evoked firing rate, signal-to-noise ratio for evoked firing, or predictive power of the context model (a measure of model goodness-of-fit).

      The reviewer is correct, however, that we did not address possible differences between units recorded at different cortical depths or of different cell types, due to limitations of our methodology and sampling.

      Reviewer #2 (Public Review):

      Summary:

      This study explores the fundamental neuroscience question of the stability of neuronal representation. The concept of 'representational-drift' has been put forward after observations made using 2-photon imaging of neuronal activity over many days revealed that neurons contribute in a time-limited manner to population representation of stimuli or experiences. The authors contribute to the still contested concept of 'drifts' by measuring representation across days using electrophysiology and thus with sufficient temporal resolution to characterize the receptive fields of neurons in timescales relevant to the stimuli used. The data obtained from chronic recordings over days combined with nonlinear stimulus-response estimation allows the authors to conclude that both the spectrotemporal receptive fields as well as contextual gain fields dependent on combination sensitivity to complex stimuli were stable over time. This suggests that when a neuron is responsive to experimental parameters across long periods of time (days), its sensory receptive field is stable.

      Strengths:

      The strength of this study lies in the capacity to draw novel conclusions on auditory cortex representation based on the experimentally difficult combination of stable recordings of neuronal activity, behavior, and pupil over days and state-of-the-art analysis of receptive fields.

      Weaknesses:

      It would have been desirable, but too ambitious in the current setting, to be able to assess what proportion if any of the neurons drop out or in to draw a closer parallel with the 2-photon studies.

      We certainly agree that this comparison would have been desirable in principle. In practice, however, it was technically infeasible and would have been likely to produce misleading results. Our criteria for spike waveform matching across days were extremely conservative, to minimise the potential for a false positive match (which could artifactually decrease apparent stability of unit responses). Therefore, we were likely to have missed some neurons that did in fact remain active over days, due to small changes in extracellular waveform or just noise (which could artifactually decrease apparent stability of population representations). Two-photon imaging is more appropriate for analysing population stability, because cell identity is determined by spatial location. However, as we mention in the paper, electrophysiology is more appropriate for analysing receptive-field stability, because the temporal resolution is sufficient to resolve structure at the millisecond timescales relevant to auditory perception.

      Reviewer #3 (Public Review):

      Summary:

      In their study on "Nonlinear sensitivity to acoustic context is a stable feature of neuronal responses to complex sounds in auditory cortex of awake mice", Akritas et al. investigate the stability of the response properties of neurons in the auditory cortex of mice. They estimate a model with restricted non-linearities for individual neurons and compare the model properties between recordings on the same day and subsequent days. They find that both the linear and nonlinear components of the model stay rather constant over this period and conclude that on the level of the tuning properties, there is no evidence for representational drift on this time scale.

      Strengths:

      - The study has a clear analytical approach that goes beyond linear models and investigates this in a rigorous way, in particular comparing across-day variability to within-day variability.

      - The use of tetrodes is a rather reliable way in electrophysiological recordings to assess neuron identity over multiple days.

      - The comparison with pupil and motion activity was useful and insightful.

      - The presentation of the study is very logical and pretty much flawless on the writing level.

      Weaknesses:

      - The stability results across cells show a good amount of variability, which is only partially addressed.

      - In particular, no attempt is made to localize the cells in space, in order to check whether these differences could be layer or area-dependent.

      - The full context model also includes the possibility to estimate the input non-linearity, which was not done here, but could have been insightful.

      We agree with these comments and acknowledge these limitations, which arise from technological constraints. In particular, the tangential trajectory of our chronic tetrode implant, used to maximise stability of chronic recordings, limited our ability to sample cells from different cortical layers/areas and to explore how these factors might relate to variability in stability across units. Estimating input nonlinearities would have been valuable but also would have increased the number of parameters in the model and the data required to obtain reliable, predictive model fits.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors explored how galanin affects whole-brain activity in larval zebrafish using wide-field Ca2+ imaging, genetic modifications, and drugs that increase brain activity. The authors conclude that galanin has a sedative effect on the brain under normal conditions and during seizures, mainly through the galanin receptor 1a (galr1a). However, acute "stressors(?)" like pentylenetetrazole (PTZ) reduce galanin's effects, leading to increased brain activity and more seizures. The authors claim that galanin can reduce seizure severity while increasing seizure occurrence, speculated to occur through different receptor subtypes. This study confirms galanin's complex role in brain activity, supporting its potential impact on epilepsy.

      Strengths:

      The overall strength of the study lies primarily in its methodological approach using whole-brain Calcium imaging facilitated by the transparency of zebrafish larvae. Additionally, the use of transgenic zebrafish models is an advantage, as it enables genetic manipulations to investigate specific aspects of galanin signaling. This combination of advanced imaging and genetic tools allows for addressing galanin's role in regulating brain activity.

      Weaknesses:

      The weaknesses of the study also stem from the methodological approach, particularly the use of whole-brain Calcium imaging as a measure of brain activity. While epilepsy and seizures involve network interactions, they typically do not originate across the entire brain simultaneously. Seizures often begin in specific regions or even within specific populations of neurons within those regions. Therefore, a whole-brain approach, especially with Calcium imaging with inherited limitations, may not fully capture the localized nature of seizure initiation and propagation, potentially limiting the understanding of Galanin's role in epilepsy.

      Furthermore, Galanin's effects may vary across different brain areas, likely influenced by the predominant receptor types expressed in those regions. Additionally, the use of PTZ as a "stressor" is questionable since PTZ induces seizures rather than conventional stress. Referring to seizures induced by PTZ as "stress" might be a misinterpretation intended to fit the proposed model of stress regulation by receptors other than Galanin receptor 1 (GalR1).

      The description of the EAAT2 mutants is missing crucial details. EAAT2 plays a significant role in the uptake of glutamate from the synaptic cleft, thereby regulating excitatory neurotransmission and preventing excitotoxicity. Authors suggest that in EAAT2 knockout (KO) mice galanin expression is upregulated 15-fold compared to wild-type (WT) mice, which could be interpreted as galanin playing a role in the hypoactivity observed in these animals.

      Indeed, our observation of the unexpected hypoactivity in EAAT2a mutants, described in our description of this mutant (Hotz et al., 2022), prompted us to initiate this study formulating the hypothesis that the observed upregulation of galanin is a neuroprotective response to epilepsy.

      However, the study does not explore the misregulation of other genes that could be contributing to the observed phenotype. For instance, if AMPA receptors are significantly downregulated, or if there are alterations in other genes critical for brain activity, these changes could be more important than the upregulation of galanin. The lack of wider gene expression analysis leaves open the possibility that the observed hypoactivity could be due to factors other than, or in addition to, galanin upregulation.

      We have performed a transcriptome analysis that we are still evaluation. We can already state that AMPA receptor genes are not significantly altered in the mutant.

      Moreover, the observation that in double KO mice for both EAAT2 and galanin, there was little difference in seizure susceptibility compared to EAAT2 KO mice alone further supports the idea that galanin upregulation might not be the reason for the observed phenotype. This indicates that other regulatory mechanisms or gene expressions might be playing a more pivotal role in the manifestation of hypoactivity in EAAT2 mutants.

      We agree that upregulation of galanin transcripts is at best one of a suite of regulatory mechanisms that lead to hypoactivity in EAAT2 zebrafish mutants.

      These methodological shortcomings and conceptual inconsistencies undermine the perceived strengths of the study, and hinders understanding of Galanin's role in epilepsy and stress regulation.

      Reviewer #2 (Public Review):

      Summary:

      This study is an investigation of galanin and galanin receptor signaling on whole-brain activity in the context of recurrent seizure activity or under homeostatic basal conditions. The authors primarily use calcium imaging to observe whole-brain neuronal activity accompanied by galanin qPCR to determine how manipulations of galanin or the galr1a receptor affect the activity of the whole-brain under non-ictal or seizure event conditions. The authors' Eaat2a-/- model (introduced in their Glia 2022 paper, PMID 34716961) that shows recurrent seizure activity alongside suppression of neuronal activity and locomotion in the time periods lacking seizures is used in this paper in comparison to the well-known pentylenetetrazole (PTZ) pharmacological model of epilepsy in zebrafish. Given the literature cited in their Introduction, the authors reasonably hypothesize that galanin will exert a net inhibitory effect on brain activity in models of epilepsy and at homeostatic baseline, but were surprised to find that this hypothesis was only moderately supported in their Eaat2a-/- model. In contrast, under PTZ challenge, fish with galanin overexpression showed increased seizure number and reduced duration while fish with galanin KO showed reduced seizure number and increased duration. These results would have been greatly enriched by the inclusion of behavioral analyses of seizure activity and locomotion (similar to the authors' 2022 Glia paper and/or PMIDs 15730879, 24002024). In addition, the authors have not accounted for sex as a biological variable, though they did note that sex sorting zebrafish larvae precludes sex selection at the younger ages used. It would be helpful to include smaller experiments taken from pilot experiments in older, sex-balanced groups of the relevant zebrafish to increase confidence in the findings' robustness across sexes. A possible major caveat is that all of the various genetic manipulations are non-conditional as performed, meaning that developmental impacts of galanin overexpression or galanin or galr1a knockout on the observed results have not been controlled for and may have had a confounding influence on the authors' findings. Overall, this study is important and solid (yet limited), and carries clear value for understanding the multifaceted functions that neuronal galanin can have under homeostatic and disease conditions.

      Strengths:

      - The authors convincingly show that galanin is upregulated across multiple contexts that feature seizure activity or hyperexcitability in zebrafish, and appears to reduce neuronal activity overall, with key identified exceptions (PTZ model).

      - The authors use both genetic and pharmacological models to answer their question, and through this diverse approach, find serendipitous results that suggest novel underexplored functions of galanin and its receptors in basal and disease conditions. Their question is well-informed by the cited literature, though the authors should cite and consider their findings in the context of Mazarati et al., 1998 (PMID:982276). The authors' Discussion places their findings in context, allowing for multiple interpretations and suggesting some convincing explanations.

      - Sample sizes are robust and the methods used are well-characterized, with a few exceptions (as the paper is currently written).

      - Use of a glutamatergic signaling-based genetic model of epilepsy (Eaat2a-/-) is likely the most appropriate selection to test how galanin signaling can alter seizure activity, as galanin is known to reduce glutamatergic release as an inhibitory mechanism in rodent hippocampal neurons via GalR1a (alongside GIRK activation effects). Given that PTZ instead acts through GABAergic signaling pathways, it is reasonable and useful to note that their glutamate-based genetic model showed different effects than did their GABAergic-based model of seizure activity.

      Weaknesses:

      - The authors do not include behavioral assessments of seizure or locomotor activity that would be expected in this paper given their characterizations of their Eaat2a-/- model in the Glia 2022 paper that showed these behavioral data for this zebrafish model. These data would inform the reader of the behavioral phenotypes to expect under the various conditions and would likely further support the authors' findings if obtained and reported.

      We agree that a thorough behavioral assessment would have strengthened the study, but we deemed it outside of the scope of this study.

      - No assessment of sex as a biological variable is included, though it is understood that these specific studied ages of the larvae may preclude sex sorting for experimental balancing as stated by the authors.

      The study was done on larval zebrafish (5 days post fertilization). The first signs of sexual differentiation become apparent at about 17 days post fertilization (reviewed in Ye and Chen, 2020). Hence sex is no biological variable at the stage studied. 

      - The reported results may have been influenced by the loss or overexpression of galanin or loss of galr1a during developmental stages. The authors did attempt to use the hsp70l system to overexpress galanin, but noted that the heat shock induction step led to reduced brain activity on its own (Supplementary Figure 1). Their hsp70l:gal model shows galanin overexpression anyways (8x fold) regardless of heat induction, so this model is still useful as a way to overexpress galanin, but it should be noted that this galanin overexpression is not restricted to post-developmental timepoints and is present during development.

      The developmental perspective is an important point to consider. Due to the rapid development of the zebrafish it is not trivial to untangle this. In the zebrafish we first observe epileptic seizures as early as 3 days post fertilization (dpf), where the brain is clearly not well developed yet (e.g. behavioral response to light are still minimal). Even the 5 dpf stage, where most of our experiments have been conducted, cannot by far not be considered post-development.  

      Reviewer #3 (Public Review):

      Summary:

      The neuropeptide galanin is primarily expressed in the hypothalamus and has been shown to play critical roles in homeostatic functions such as arousal, sleep, stress, and brain disorders such as epilepsy. Previous work in rodents using galanin analogs and receptor-specific knockout has provided convincing evidence for the anti-convulsant effects of galanin.

      In the present study, the authors sought to determine the relationship between galanin expression and whole-brain activity. The authors took advantage of the transparent nature of larval zebrafish to perform whole-brain neural activity measurements via widefield calcium imaging. Two models of seizures were used (eaat2a-/- and pentylenetetrazol; PTZ). In the eaat2a-/- model, spontaneous seizures occur and the authors found that galanin transcript levels were significantly increased and associated with a reduced frequency of calcium events. Similarly, two hours after PTZ galanin transcript levels roughly doubled and the frequency and amplitude of calcium events were reduced. The authors also used a heat shock protein line (hsp70I:gal) where galanin transcript levels are induced by activation of heat shock protein, but this line also shows higher basal transcript levels of galanin. Again, the higher level of galanin in hsp70I:gal larval zebrafish resulted in a reduction of calcium events and a reduction in the amplitude of events. In contrast, galanin knockout (gal-/-) increased calcium activity, indicated by an increased number of calcium events, but a reduction in amplitude and duration. Knockout of the galanin receptor subtype galr1a via crispants also increased the frequency of calcium events.

      In subsequent experiments in eaat2a-/- mutants were crossed with hsp70I:gal or gal-/- to increase or decrease galanin expression, respectively. These experiments showed modest effects, with eaat2a-/- x gal-/- knockouts showing an increased normalized area under the curve and seizure amplitude.

      Lastly, the authors attempted to study the relationship between galanin and brain activity during a PTZ challenge. The hsp70I:gal larva showed an increased number of seizures and reduced seizure duration during PTZ. In contrast, gal-/- mutants showed an increased normalized area under the curve and a stark reduction in the number of detected seizures, a reduction in seizure amplitude, but an increase in seizure duration. The authors then ruled out the role of Galr1a in modulating this effect during PTZ, since the number of seizures was unaffected, whereas the amplitude and duration of seizures were increased.

      Strengths:

      (1) The gain- and loss-of function galanin manipulations provided convincing evidence that galanin influences brain activity (via calcium imaging) during interictal and/or seizure-free periods. In particular, the relationship between galanin transcript levels and brain activity in Figures 1 & 2 was convincing.

      (2) The authors use two models of epilepsy (eaat2a-/- and PTZ).

      (3) Focus on the galanin receptor subtype galr1a provided good evidence for the important role of this receptor in controlling brain activity during interictal and/or seizure-free periods.

      Weaknesses:

      (1) Although the relationship between galanin and brain activity during interictal or seizure-free periods was clear, the manuscript currently lacks mechanistic insight in the role of galanin during seizure-like activity induced by PTZ.

      We completely agree and concede that this study constitutes only a first attempt to understand the (at least for us) perplexing complexity of galanin function on the brain.

      (2) Calcium imaging is the primary data for the paper, but there are no representative time-series images or movies of GCaMP signal in the various mutants used.

      We are in the process of preparing some time series images and will include them in the next revision.

      (3) For Figure 3, the authors suggest that hsp70I:gal x eaat2a-/-mutants would further increase galanin transcript levels, which were hypothesized to further reduce brain activity. However, the authors failed to measure galanin transcript levels in this cross to show that galanin is actually increased more than the eaat2a-/- mutant or the hsp70I:gal mutant alone.

      This is an excellent suggestion. We will perform the necessary qPCR experiments and will include the data in the next revision.

      (4) Similarly, transcript levels of galanin are not provided in Figure 2 for Gal-/- mutants and galr1a KOs. Transcript levels would help validate the knockout and any potential compensatory effects of subtype-specific knockout.

      (5) The authors very heavily rely on calcium imaging of different mutant lines. Additional methods could strengthen the data, translational relevance, and interpretation (e.g., acute pharmacology using galanin agonists or antagonists, brain or cell recordings, biochemistry, etc).

      Again, we agree and concede that a number of additional approaches are needed to get more insight into the complex role of galanin in regulation overall brain activity. These include, among others, also behavioral, multiple single cell recordings and pharmacological interventions.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript addresses a fundamental question about how different types of communication signals differentially affect brain state and neurochemistry. In addition, their manuscript  highlights the various processes that modulate brain responses to communication signals, including prior experience, sex, and hormonal status. Overall, the manuscript is well-written and the research is appropriately contextualized.

      That being said, it remains important for the authors to think more about their analytical approaches. In particular, the effect of normalization and the explicit outlining and interpretations of statistical models. As mentioned in the original review, the normalization of neurochemical data seems unnecessary given the repeated-measures design of their analysis and by normalizing all data to the baseline data and including this baseline data in the repeated measures analysis,   one artificially creates a baseline period with minimal variation that dramatically differs in variance from other periods (akin to heteroscedasticity). If the authors want to analyze how a stimulus changes neurochemical concentrations, they could analyze the raw data but depict normalized data in their figures (similar to other papers). Or they could analyze group differences in the normalized data of the two stimulus periods (i.e., excluding the baseline period used for normalization).

      We appreciate the reviewer’s point on the difference in variance caused by including the 100% baseline values in the analysis. After consulting with our statistician, we chose the latter of the two approaches suggested by the reviewer. Specifically, we reran the analysis to exclude the baseline and focus only on the playback windows and the group differences. The text in the results, the significance signs in the figures, and the discussion are corrected accordingly. Despite these changes, our major conclusions remains as before.

      We also followed this reviewer’s suggestions to clarify the statistical model in studying the experience effect. After further consultation with our statistician, we reran the analysis on experience effect, including all the groups of EXP and INEXP animals together. We have corrected text in the figure captions, results, discussion, and data analysis sections of the manuscript related to the effect of experience and its interactions. This has not changed the conclusion made related to the experience effect in the dataset.

      It would also be useful for the authors to provide further discussion of the potential contributions of different types of experiences (mating vs. restraint) to the change in behavior and neurochemical responses to the vocalization playbacks and to try to disentangle sensory and  motor contributions to neurochemical changes.

      We have acknowledged in the Discussion that previous studies suggest that the effect of experience involving stress could be generalized. We believe that this is an important area of future research. Our Discussion acknowledges that the relationship between sensory and motor contributions to neurochemical changes remains an area of interest. We further point out that the time resolution of microdialysis data renders the suggested discussion highly speculative. We plan to use other methods to assess this in future experiments.

      Reviewer #3 (Public Review):

      The work by Ghasemahmad et al. has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the basolateral amygdala (BLA) while an animal listens to social vocalizations.

      Ghasemahmad et al. made changes to the manuscript that have significantly improved the work. In particular, the transparency in showing the underlying levels of Ach, DA, and 5HIAA is excellent. My previous concerns have been adequately addressed.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I appreciate the authors responses to my previous queries (and to the comments by other reviewers). The introduction does a better job contextualizing the data, and the additional details in the results and Methods sections help readers digest the material. I continue to think the topic  is interesting and the manuscript is potentially impactful. However, I continue to be concerned about their analytical approaches and other aspects of the revised manuscript.

      (a) Normalization

      In my original review I wrote: "The normalization of neurochemical data seems unnecessary   given the repeated-measures design of their analysis and could be problematic; by normalizing     all data to the baseline data (p. 24), one artificially creates a baseline period with minimal   variation (all are "0"; Figures 2, 3 & 5) that could inflate statistical power." I continue to feel that an analysis of normalized data that includes the baseline data is inappropriate because of the minimal variation in the normalized data for the baseline period. When the normalized data for   the baseline period is included in the analysis, there is clearly variation in the extent of variability within each of the time periods (no variability at baseline, variability during periods 1 & 2; analogous to heteroscedasticity). For example, when analyzing the RAW DATA about the change in ACh release in experienced males listening to restraint vocalizations (thank you for releasing the raw data), there was a non-significant effect of time (baseline, period 1, and period 2; linear mixed effects model; F(2,12)=3.2, p=0.0793). However, when the normalized data for  this dataset was analyzed (with baseline values being set at 100% for each mouse), there was a statistically significant effect (F(2,12)=4.5, p=0.0352). This example is just to illustrate how normalization can affect (e.g., inflate) statistical power.

      That being said, I do think that it is reasonable to analyzed normalized data if the period used for normalization is NOT included in the analysis (see Figure 3 of one of the paper the authors listed in their response to reviewers: Galvez-Marquez et al., 2022). However, from the reading of this manuscript, it does seem like normalized baseline data are analyzed to assess how stimuli affect neurochemical concentrations.

      We appreciate the reviewer’s point on the difference in variance caused by including the 100% baseline values in the analysis. After consulting with our statistician, we chose one of the two approaches suggested by the reviewer. Specifically, we reran the analysis to exclude the baseline and focus only on the playback windows and the group differences. The text in the results, the significance signs in the figures, and the discussion are corrected accordingly. Despite these changes, our major conclusions remains as before. We have included some descriptive statistics in the text because we think these are informative.

      We decided to take this approach because the inter-individual variability in the raw data levels, caused by non-experimental factors, is too great to be useful. As we have stated before, these values are affected by probe placement, collection process, or differences in the HPLC or LC/MS runs. These effects are widely recognized in the field.

      It is worth pointing out a few things about the papers listed by the authors. Li et al. (2023) does depict normalized microanalysis data but it isn't clear that any analysis of the normalized data is conducted. The same can be said about Holly et al. (2016). Further, in Bagley et al (2011), the authors depict normalized data in the figures but conduct analyses on the raw data ("After  chronic morphine treatment, systemic naloxone injection increased GABA outflow in PAG by 41% (from 24.6 {plus minus} 2.9 nM to a peak of 34.8 {plus minus} 3.8 nM, n = 6, P = 0.016), but did not alter GABA levels after vehicle treatment (39.8 {plus minus} 8.3 to 38.6 {plus  minus} 7.4 nM with naloxone at matched peak time, n = 4; Fig. 3a)". This latter approach (analyzing raw data in a repeated-measures manner and depicted normalized data) seems reasonable for the authors of the current study.

      (b) Clarification and modification of statistical models

      When analyzing the effect of experience on neuromodulator release, the authors analyze the experienced and inexperienced mice independently (e.g., figure 3 vs. 6). The ideal way to assess the effects of experience is to create a factorial model. For example, one could analyze a full factorial model with experience (exp vs. inexp), stimulus time (mating vs. restraint) and time  (baseline, period 1 vs period 2, assuming raw data are used). If one wanted to exclude the  baseline period because group differences in baseline are not informative, conducting a factorial analysis of normalized data with just the data from period 1 and 2 seems fine. I believe an analysis like this will help increase the legitimacy of the analysis. For example, when analyzing the normalized data (periods 1 and 2) of experienced and inexperienced males in response to mating or restraint vocalizations, you find a significant interaction between experience and stimulus type. Finding an effect of experience in an analysis that includes both experienced and inexperienced mice is ideal from an analytical framework.

      In Figure 6, it is not clear what the statistical model is and what the interactions mean. For example, in the figure legend for figure 6, the authors report time*context and time*sex interactions. However, in this analysis there are two groups of inexperienced males (males that   are listening to restraint vocalizations, males that are listening to mating vocalizations) and one group of females (females that are listening to mating vocalizations); in other words, this is an unbalanced analysis. So, when the authors indicate a time*context interaction, does that mean  they are comparing the male-restraint group to the combination of males and females listening to mating vocalizations? And when they talk about a time*sex interaction, are they analyzing how males listening to either mating or restraint vocalizations differ from females listening to a   mating vocalization? This all seems peculiar to me.

      - A similar set of questions could be raised about interaction effects depicted in Figure 4.

      Overall, I would like this manuscript to be reviewed by a statistician to provide additional input on how best to analyze the data.

      We followed the reviewer’s suggestions to clarify the statistical model in studying the experience effect. After further consultation with the statistician, we reran the analysis on experience effect, including all the groups of EXP and INEXP animals together.

      Design: Intercept + Sex +Context + Experience+ Sex* Experience + Context* Experience.

      The model is not full factorial as recommended by the statistician, because we don’t have females in the restraint group and that would make an unbalanced design. Therefore, running GLM based on the above model and included factors, as advised by the statistician, is the best way of approaching the analysis for the current dataset.

      We have corrected text in the figure captions, results, discussion, and data analysis sections of the manuscript related to the effect of experience and its interactions. The GLM models are clarified for all the figures in the “data analysis” section of the manuscript. We have clarified that the major effect of experience on neuromodulators was seen in the ACh data.

      (c) Analysis of post-stimulus period

      I agree with Reviewer 3 that analyzing the post-stimulus period would be useful. As mentioned     in the original review, these data could serve as an opportunity to show that the neurochemical levels returned to baseline and add further support for the model described in Figure 6. In   addition, these data could help reveal the link  between  neurochemical  release,  auditory responses, and behavior. If neurochemical changes reflect auditory responses, then these should back to baseline during the post-stimulus period. In addition, if behavioral variation (e.g.,    between mice hearing mating vs. restraint stimuli) persists following the termination of playback, then one could similarly assess whether neurochemical variation persists following playback. If   the latter is the case, then the neurochemical release could be more related to the behavior than to the playback stimulus itself.

      We did not change this analysis. Our response to Reviewer 3’s comment is shown below.

      “We decided not to include analyses of the post-stimulus period because this period is subject to wider individual and neuromodulator-specific effects and because it weakens statistical power in addressing the core question—the change in neuromodulator release DURING vocal playback. We agree that the general question is of interest to the field, but we don’t think our study is best designed to answer that question.”

      This was accepted by Reviewer 3. We also note that release patterns have multiple time courses (e.g., Aitta-aho et al., 2018 for ACh), and thus may not support an assumption that levels should return to baseline shortly after playback offset.

      Minor comments:

      Page 7, line 15: I suggest changing "vocalization-dependent" to "stimulus-dependent" because the former could connote patterns of release related to the animal itself vocalizing.

      Changed to: “There were also distinct patterns of ACh and DA release into the BLA depending on the type of vocalization playback (Fig 3C,D).”

      Discussion section: The authors should point out a few caveats with their experiments in the Discussion section. First, experienced animals received both mating (social) and restraint experiences, and it is not clear to what degree each type of experience affected neural and behavioral responses (i.e., specificity of experience effects). For example, mating experience can lead to a wide range of physiological changes, including a resilience to stress (e.g., Leuner et al., PLoS One, 2010; Arnold et al., Hormones and Behavior, 2019), so it is possible that mating experiences by themselves could have induced these changes. Or it could be that experiencing restraint stress affects responses to mating stimuli. This could be added to the first paragraph in page 16. (The authors could also discuss which aspects of the sexual encounters might be most important for the behavioral and neural plasticity.)

      We have added text to raise this issue, stating that it is unknown wither the experience effects are specific and citing the above references concerning the generalized effects of certain experiences.

      Discussion section: It would also be useful for the authors to discuss the extent to which behavior might be driving the neurochemical changes. Some of the analyses suggest that the release is independent of the behavior (e.g., reflects a sensory responses) but this could be emphasized    more in the Discussion.

      We believe that we have addressed this issue sufficiently in our previous response to related issues raised by this reviewer. As we note, there are limitations in the time resolution of microdialysis data that render the suggested discussion highly speculative. We plan to use other methods to assess this in future experiments.

      Figure 2, legend: Please note that the text above the images describes the stimulus played back to these animals and their hormonal state, and not the type of experienced they underwent (i.e.,  clarify the titles)

      Changed as requested.

      I also agree with Reviewer 3 that "mating experience" is a misnomer for this manuscript. "Social experience with a female" is a more accurate descriptor. If they wanted to specifically provide mating experience, males should have only been tested with estrus (receptive females). I don't think this wording change detracts from their findings.

      We have not changed this term. As noted in our previous response to Reviewer #3, we stated: “In the mating experience, mounting or attempted mounting was required for the animal to be included in subsequent testing.” Due to this requirement, the term “mating behavior” is informative and appropriate. In our view, “Social experience with a female” does not adequately describe our inclusion criterion or the experience.

      Reviewer #3 (Recommendations For The Authors):

      The work by Ghasemahmad et al. has the potential to significantly advance our understanding of how neuromodulators provide internal-state signals to the basolateral amygdala (BLA) while an animal listens to social vocalizations.

      Ghasemahmad et al. made changes to the manuscript that have significantly improved the work. In particular, the transparency in showing the underlying levels of Ach, DA, and 5HIAA is excellent. My previous concerns have been adequately addressed. I only have a few minor suggestions for the text and one figure.

      Minor suggestions:

      Page 2, Ln 9: add adult before male and female mice

      Changed as requested

      Page 4, Ln 10: add a period after Tsukano et al., 2019)

      Changed as requested

      Page 6, Ln 9: what did you mean by "their interaction"? Being more specific, but concise, would help the readers.

      We revised the wording to clarify that the neuromodulatory systems interact in the emission of positive and negative vocalizations.

      Page 6, Ln 17: You mention Stim 1 and Stim 2, but the stimuli are not defined at this point. The clear explanation is provided in the following paragraph. Maybe consider switching the order  and define the stimuli before you describe the liquid chromatography/mass spectrometry technique.

      We have revised and merged these paragraphs so that Stim 1 and Stim 2 are defined on first use. We also revised our description of the depiction and analysis of neurochemical data.

      Page 11, Ln 12: replace well-proven with well-documented

      Changed as requested

      Figure 2: There are two arrows pointing towards a single track. I assume one of the arrows is a duplicate. If so, delete one of the arrows. If not, please explain what the second arrow represents.

      Arrow removed

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors have studied the effects of platelets in OPC biology and remyelination. For this, they used mutant mice with lower levels of platelets as a demyelinating/remyelinating scenario, as well as in a model with large numbers of circulating platelets.

      Strengths:

      -The work is very focused, with defined objectives.

      -The work is properly done.

      Weaknesses:

      -There is no clear effect on a single cell type and/or mechanism involved.

      We appreciate the reviewer’s feedback. We understand that from our in vivo studies we are unable to distinguish whether the effects of platelets are directly exerted on OPCs or indirectly through a different cell type. However, data obtained from the platelet depleted model as well as the new data provided in this revised version in CALRHet mice indicate that, at least, macrophages / microglia do not contribute to the observed effects in OPCs. In addition to this, in vitro data support the direct effects of platelets on OPC function.

      Reviewer #2 (Public Review):

      Summary:

      This paper examined whether circulating platelets regulate oligodendrocyte progenitor cell (OPC) differentiation for the link with multiple sclerosis (MS). They identified that the interaction with platelets enhances OPC differentiation although persistent contact inhibits the process in the longterm. The mouse model with increased platelet levels in the blood reduced mature oligodendrocytes, while how platelets might regulate OPC differentiation is not clear yet.

      Strengths:

      The use of both partial platelet depletion and thrombocytosis mouse models gives in vivo evidence. The presentation of platelet accumulation in a time-course manner is rigorous. The in vitro co-culture model tested the role of platelets in OPC differentiation, which was supportive of in vivo observations.

      Weaknesses:

      How platelets regulate OPC differentiation is not clear. What the significance of platelets is in MS progression is not clear.

      We thank reviewer’s view and assessment of our manuscript. We understand both of the reviewer’s concerns. Firstly, we performed additional in vitro studies and we have confirmed that platelet-contained factors are, at least in part, responsible for modulating OPC differentiation and, thus, direct cell-cell contact is not essential. Secondly, in this revised version, we added references arguing that the plasma levels of platelet microparticles and platelet-specific factors correlate with MS progression and severity.  

      Reviewer #1 (Recommendations For the Authors):

      To ameliorate the quality of their work and make it suitable for its publication in eLife, I strongly suggest the authors to: 

      (1) In vitro co-culture platelets and OPCs to check the effects on this latter cell type biology. 

      Response: We have performed in vitro studies, in which OPCs were co-cultured with washed platelets (WP). We observed that OPC differentiation was boosted after a short exposure to WP, however, prolonged exposure to WP suppressed this effect (revised Figure 3A and B). Also, our new data using platelet lysate (PL) indicate that platelet-contained molecules are responsible for this effect (revised Figure 3C and D). Finally, we showed that by removing PL after sustained exposure (6 DIV) the ability of platelets to promote OPC differentiation is rescued (revised Figure 3E and F).

      (2) In the CALR model, can the authors check effect of chronic exposure to large numbers of platelets? Is this affecting macrophages (including their polarization)? 

      Response: Yes, compared to wild type mouse, in the CALRHET model we confirmed the presence of larger number of platelets within demyelinated lesions (Figure 4A and C). Also, in this revised version we added data showing in the CALRHET model that thrombocytosis does not affect macrophage / microglia numbers and polarization (revised Supplementary Figure 2). 

      (3) Some aspects of the Introduction section seems too old-fashioned (ex.: the use of bFGF instead of FGF2 to refer to Fibroblast Growth Factor 2), as well as it would be convenient to include more recent references on the role of FGF2 and PDGFa in OPC biology. 

      Response: We agree with the reviewer. In this revised version we have changed bFGF for FGF2 and we added more recent references addressing the role of FGF2 and PDGFa in OPC biology.

      (4) There are some constructions and typos that could be corrected. 

      Response: We have checked language constructions as well as typos, and these have been corrected.

      (5) Please revise spelling of some names in the bibliography list (ex.: the correct surname is ffrenchConstant, not Ffrench-Constant).

      We have revised the spelling of names within the bibliography, and we have corrected them accordingly.

      Reviewer #2 (Recommendations For the Authors):

      Mechanisms of platelet-OPC interactions 

      -  transwell co-culture assay will examine if the OPC phenotype is through direct or indirect interactions with platelets; 

      We have performed additional in vitro studies, in which OPCs were exposed to platelet lysate (PL). New results indicate that a short exposure to PL can promote OPC differentiation (revised Figure 3C and D), while a sustained exposure supresses this effect (revised Figure 3E and F). These findings indicate that platelet-contained factors are, at least in part, responsible for modulating OPC differentiation and, thus, direct cell-cell contact is not essential for such an effect.

      -  can you revert the phenotype of OPCs co-cultured long with platelets (maturation blocked) by removing platelet (then OPC differentiate again?) to see if the phenotype is reversible or not? 

      We would like to thank the reviewer for bringing up this interesting question. We have performed additional in vitro studies to address this issue. We found that by removing PL upon 6-days of sustained exposure rescues the ability of platelets to promote OPC differentiation (revised Figure 3E and F). These findings indicate that the supressing effect of prolonged exposure to platelets in OPC differentiation is reversible.  

      Clinical correlation 

      -  How many MS patients has an abnormal number of or exposure to platelets? 

      We have added new information in the introduction section. Indeed, previous studies have shown that MS patients display higher levels of circulating platelet microparticles (PMPs) (MarcosRamiro et al., 2014) as well as increased plasma levels of platelet-specific factors such as, P-selectin and PF4 (Cananzi et al., 1987; Kuenz et al., 2005).

      do platelets amount correlate with diseases severeness? 

      We have added new information in the introduction section. Indeed, PMPs are indicative of the clinical status of the disease (Saenz-Cuesta et al., 2014). Also, plasma levels of P-selectin and PF4 correlate with disease course and severity, respectively (Cananzi et al., 1987; Kuenz et al., 2005).

      Image quantification 

      -  please state how many sections were counted. How many animals were used per condition. Is the practice of blinded observers done for each dataset?

      We added this information in the figure legends and in methods section. We counted between 3-5 sections per lesion. We used 3 to 6 animals per experimental group and data was analysed by blinded observers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The primary weakness of the paper concerns its conclusion of having generated "homogenous mature microglia", partly based on the RNAseq analysis. However, the comparison of gene profiles was carried out only between "hiPSC-derived mature microglia" and the proliferating myeloid progenitors. While the transcriptome profiles revealed a trend of enrichment of microglia-like gene expression in "hiPSC-derived mature microglia" compared to proliferating myeloid progenitors, this is not sufficient to claim they are "mature microglia". It is important that one carries out a comparative analysis of the RNAseq data with those of primary human microglia, which may be done by leveraging the public database. To convincingly claim these cells are mature microglia, questions need to be addressed including how similar the molecular signatures of these cells are compared with the fully differentiated primary microglia cell or if they remain progenitor-like or take on mosaic properties, and how they distinguish from macrophages.

      We greatly appreciate the insightful comments and suggestions from the reviewers, which were instrumental in enhancing our data analysis and organization. In response to the feedback, we have updated the terminology from “mature microglia” to simply “microglia” while clarifying in our text that these are fully differentiated microglia under single-type cell culture conditions.

      Guided by the reviewer's advice, we incorporated RNA-seq data from human brain microglia studies conducted by Dr. Poon and Dr. Blurton-Jones' Lab (Abud et al., Neuron, 2017) and Dr. Huitinga's Lab (van der Poel et al., Nat Commun, 2019). We then conducted a comparative analysis of the gene expression profiles between our fully differentiated hiPSC-derived microglia and those from fetal/adult brain microglia (see Fig.2. Suppl. B, C and D; Suppl. table 1 and table 2). The correlation analysis revealed that our hiPSC-derived microglia closely resemble fetal and adult brain microglia, distinguishing them significantly from monocytes and inflammatory monocytes.

      (2) While the authors attempted to demonstrate the functional property of "hiPSC-derived mature microglia" in culture, they used LPS challenge, which is an inappropriate assay. This is because human microglia respond poorly to LPS alone but need to be activated by a combination of LPS with other factors, such as IFNγ. Their data that "hiPSC-derived mature microglia" showed robust responses to LPS indeed implicates that these cells do not behave like mature human microglia.

      We appreciate the feedback received. In response, we cultured hiPSC-derived microglia cells and subjected them to treatments with IFNγ, LPS, and a combination of both IFNγ+LPS, as illustrated in Figure 3 suppl. Our findings revealed that the IFNγ+LPS combination notably enhanced the expression of IL1a, IL1b, TNFa, CCL8, and CXCL10, whereas IL6 and CCL2 levels remained unchanged. Treatment with IFNγ alone significantly elevated the expression of TNFa, CCL8, CXCL10, and CCL2. These outcomes align with the findings reported by Rustenhoven et al. (Sci Rep, 2016), suggesting that the functionality of our hiPSC-derived microglia cells closely mirrors that of primary human adult microglia cells.

      (3) The resolution of Figs. 4 - 6 is so low that even some of the text and labels are hardly readable. Based on the morphology shown in Fig. 4 and the statement in line 147, these hiPSC-derived "cells altered their morphology to a rounded shape within an hour of incubation and rapidly internalized the fluorescent-labeled particles". This is a peculiar response. Usually, microglia do not respond to fluorescent-labeled zymosan by turning into a rounded shaped within an hour when they internalize them. Such a behavior usually implicates weak phagocytotic capacity.

      Thank you for your insightful comments. During submission, the main text's PDF version was converted online, resulting in low-quality output. We have since updated this with a high-resolution version. The observed alterations in cell morphology following zymosan phagocytosis may be attributed to the high zymosan concentration used (2mg/ml). We conducted an assessment to understand the impact of zymosan concentration on the morphology of hiPSC-derived microglial cells, as shown in Figure 4 suppl B. Our findings indicate that microglia cells adopt an amoeboid, rounded shape at zymosan concentrations exceeding 20ug/ml. To clarify this point, we have amended the text to read: "The cells altered their morphology and rapidly internalized the fluorescent-labeled particles."

      (4) Data presented in Fig. 5 are not very convincing to support that transplanted cells were immunopositive for "human CD11b (Fig.5C), as well as microglia signature markers P2ry12 and TMEM119 (Fig.5D)" (line 167). The resolution and magnification of Fig. 5D is too low to tell the colocalization of tdT and human microglial marker immunolabeling. In the flat-mount images (C, I), hCD11b immunolabeling is not visible in the GCL or barely visible in the IPL. This should be discussed.

      We are grateful for the reviewer's comments. As previously mentioned, the low quality of the images was due to the online conversion of the PDF version. We have now submitted both high-quality PDF and Word versions for the reviewer's assessment. In these high-quality versions, the colocalization of tdT with human P2ry12 and TMEM119 is distinctly visible. Additionally, we have updated the hTMEM119 staining images in Figure 5D. The results from hCD11b staining align with those observed in mouse CD11b staining, notably showing more effective staining in the outer plexiform layer (OPL) microglia cells. The reason for this—whether it pertains to a staining issue, a variance in CD11b expression among microglia cells in the OPL and ganglion layer (GL), or differences in the samples due to varying conditions—is not yet clear and warrants further investigation.

      (5) Microglia respond to injury by becoming active and lose their expression of the resting state microglial marker, such as P2ry12, which is used in Fig. 6 for detection of migrated microglia. To confirm that these cells indeed respond to injury like native microglia, one should check for activated microglial markers and induction of pro-inflammatory cytokines in the sodium iodate-injury model.

      The reviewer's insights are spot-on. We utilized preserved retinas to extract mRNA, which was then reverse-transcribed to cDNA for conducting qRT-PCR using human-specific primers, as detailed in the updated Table 5. The findings revealed that following retinal pigment epithelium (RPE) injury for 3 days, the transplanted hiPSC-derived microglial cells exhibited an increase in the production of inflammatory cytokines and upregulated genes related to phagocytosis, migration, and adhesion. Conversely, there was a decrease in the expression of microglia-specific signature genes and neurotrophic factors, as demonstrated in Figure 7 suppl.

      Reviewer #1 (Recommendations For The Authors):

      Line 52: "Microglia cell repopulation research suggests that: 1) if no injury or infection occurs, retinal microglia cells can sustain their homeostasis indefinitely" - this statement is too strong or delivers a confusing message; it needs clarification or to be backed up by evidence. Recent single cell RNA sequencing analyses suggest that even under a normal condition, residential microglia do not present as a single homeostatic cell cluster, rather a subpopulation of activated inflammatory microglia are constantly detectable in the normal retina. This is likely because normal retinal neurons can be stressed due to various reasons, such as the temporal accumulation of misfolded proteins, exposed to strong light, or ageing, etc.

      We appreciate the comments. We changed the sentence to read, "Microglia cell repopulation research suggests that: 1) retinal resident microglia cells can sustain their population with the local dividing and migration if any perturbations do not exceed the threshold of the recovery speed by local neighbor microglia cells."

      Line 83: "we applied an appropriate protocol for culturing human iPSC-derived microglia cells" - it would be more appropriate if the word "appropriate" can be replaced by either "unique" or a phrase like "we adopted a (previously published) protocol...".

      Thanks! We changed it to “We modified a previously published protocol to culture human iPSC-derived microglia cells.".

      Fig. 1F,G: A method of flow cytometry will provide more comprehensive cell quantification for percentages of positively labeled cells than cell counts under high magnification confocal images.

      Thanks for the comments! We agreed with the reviewer. Given the experimental resources available, the quantifications of confocal images did provide a reasonable assessment. We will perform flow cytometry analysis in future experiments.

      Reviewer #2 (Public review):

      Weaknesses:

      Gene expression analysis of mature microglia cells should be better interpreted and it would be beneficial to compare the iPSC-derived microglia gene set to a human microglial cell line (for example, HMC3) instead of myeloid progenitor cells.<br /> The way that the manuscript has been written, unfortunately, is not optimal. I recommend that the entire manuscript be edited and proofread in English. The text contains spelling and grammar mistakes, and the manuscript is inconsistent in several parts. The manuscript should also be revised for a scientific paper format.

      We appreciate the reviewer's comments and have taken them into consideration along with similar inquiries from Reviewer 1. Following the suggestions, we conducted a comparison of gene expression profiles between our hiPSC-derived microglia and those from fetal/adult brain microglia, as depicted in the updated Fig.2. Suppl. B, C and D; as well as in the Suppl. table 1 and table 2. The correlation analysis demonstrated that the hiPSC-derived microglia cells closely resemble fetal and adult brain microglia, significantly differing from monocytes and inflammatory monocytes. Additionally, we have revised the manuscript to adhere more closely to the conventional scientific format.

      Reviewer #2 (Recommendations For The Authors):

      Specific suggestions for improvement:

      - Regarding the characterization of human iPSC-derived microglia, P2RY12 is a general hematopoietic cell marker. One cannot judge the maturity of microglia only by P2RY12 expression (for example, line 261). The expression of more specific markers such as TMEM119 and PROS1 should be studied and discussed.

      We are thankful for the reviewer's valuable feedback. In response:

      We have removed the term "mature" and clarified that the hiPSC-derived microglia we studied are fully differentiated within single-type cell culture conditions.

      We performed a comparative analysis of the gene expression profiles between our hiPSC-derived microglia and microglia from human brains, as illustrated in the updated Fig.2. Suppl. B, C and D. The results affirm that hiPSC-derived microglia closely resemble human fetal and adult microglia.

      We noted that the expression of TMEM119 in hiPSC-derived microglia under in vitro single-type cell culture conditions is notably low, as shown in the below A. This suggests that the stimulatory factors in our single-type cell culture might not sufficiently induce TMEM119 expression in microglia. The necessity for a retinal environment or interaction with neuronal and/or other glial cells for TMEM119 expression mirrors the behavior of infiltrating peripheral monocytes in pathological conditions, which initially lack TMEM119 but later differentiate into microglial-like macrophages that express TMEM119, as reported by Ma et al. in Sci Rep (2017).

      Additionally, our findings suggest that PROS1 is not uniquely characteristic of microglia but is expressed across a variety of cell types. Within our specific culture conditions, we noted a higher expression of PROS1 in microglial progenitor cells, as shown in Author response image 1B and C.

      Author response image 1.

      - In Figure 2, Part E, the names of the genes or pathways in the figure are not clear, and are these genes the set that are the most differentially expressed between iPSCs-derived microglia and MPC? The analysis needs more explanation.

      We regret any confusion caused by our previous explanation. To clarify, we compiled a list of microglia-enriched genes from the research conducted by Barres BA Lab (Bennett et al., Proc Natl Acad Sci U S A, 2016) and from our own RNA sequencing data of mouse retinal microglia, identifying a total of 130 genes predominantly expressed in microglia (Suppl. Table 3). We then applied this gene list to analyze our hiPSC-derived microglia RNA sequencing data, resulting in the identification of 71 microglia-specific genes. These 71 genes were subjected to Ingenuity Pathway Analysis (IPA) to visualize the signaling pathways involved. The details of these microglia genes can be found in the updated suppl. table 3.

      - Lines 124 to 128 mention that high expression of Stat3, IL1b, and IL6 and their central role in pathway analysis emphasize the efficiency of the maturation protocol. Regarding the fact that Stat3, IL1b, and IL6 are contributors to proinflammatory pathways, it is not convincing that the high expression of these genes in iPSC-derived microglia demonstrates the efficiency of the maturation protocol, given that microglia are not stimulated.

      Thanks for the comments! We added the sentences about the comparison results between hiPSC-derived microglia and human brain microglia. We have also replaced the “mature” with “functional.” The sentence reads, “Thus, our method of obtaining differentiated microglia is a reliable method to generate a large number of homogenous functional microglia cells.”

      - Statistical analysis is missing for some graphs, for example, figures 1-3 and 5.

      We appreciate the comments. We have added the statistical results in the revised version.

      - The legend for Figure 3 needs to be rewritten. The graphs or applied assays should be explained in the legend, not the interpretation of the data.

      The legend was rewritten.

      - There is no Figure 3 in the supplement figures file.

      We added Figure 3. Suppl.

      - hTMEM119 staining in Figure 5, Part D, is mostly background. Please provide another image.

      The images were unclear after on-line converting due to the low number of pixels. We replaced them with new hTMEM119 staining images in Figure 5D.

      - In line 176, figure 5I has been forgotten to be mentioned.

      Thank you very much! We added 5I.

      - Lines 241 to 244 state that more than 50% of the AMD-associated genes are highly expressed in retinal microglia according to Fig. discussion suppl A & B. It is not clear that the gene set that was used for analysis is from a healthy retinal microglia or AMD-related ones. Please explain precisely.

      Thank you for your feedback. The gene list we referenced originates from a Genome-Wide Association Study (GWAS) that compared patients with Age-related Macular Degeneration (AMD) to healthy cohorts. We did not directly utilize this list in our experiments but referred to it to underscore the importance of microglia cells in the context of AMD.

      Some of the English proofreading and manuscript format comments:

      Line 805: Iba1 is written in lowercase. Is it human IBA1? It is not consistent with the way it is written in the text (in line 117, for example).

      Thank you for pointing out the error. We reformed all Iba1 as “Iba1”. The Iba1 we used here are all from Wako (#019–19741), which labels both mouse and human microglial cells.

      Line 814: microglia-enriched gene expression instead of microglia-enrich gene expression

      Thank you! We changed it.

      Line 345: Starting a sentence with lower case letter.

      Thank you! We changed it.

      Line 342: Myeloid lineage instead of myeloid cell linage.

      Thank you! We changed it.

      Line 815: What does FPKM stand for? The abbreviations should be explained.

      The FPKM is the abbreviation of Fragments Per Kilobase of transcript per Million mapped reads. We added it in the text.

      Line 309: The manuscript has occasionally referred to PLX-5622 without a minus. Please follow a uniform format.

      We changed all “PLX5622” to “PLX-5622”.

      Lines 327-331: should be rewritten.

      The mentioned paragraph was rewritten.

      Lines 335-340: should be rewritten.

      The mentioned sentence was rewritten.

      Line 135: qRT-PCR instead of QPCR," as it is also mentioned in the methods and material. The correction also applies to all the QPCRs in the text.

      We changed “QPCR” with “qRT-PCR”

      Figure 3: Graph B should be right side of graph A

      Images description: It is better to have the images description in the left side of the image, for example, figure 5 part B, GL, IPL and OPL

      Thanks for the suggestion. We changed the image organization as per the reviewer’s advice.

      Lines 258 to 260 in the discussion have also been repeated with the same words in the introduction.

      The mentioned paragraph was rewritten.

      Lines 327-331 should be rewritten.

      The mentioned paragraph was rewritten.

      Lines 335-340 should be rewritten.

      The mentioned paragraph was rewritten.

    1. Author response:

      Reviewer #1 (Public Review): 

      Summary: 

      In this paper, Behruznia and colleagues use long-read sequencing data for 335 strains of the Mycobacterium tuberculosis complex to study genome evolution in this clonal bacterial pathogen. They use both a "classical" pangenome approach that looks at the presence and absence of genes, and a more general pangenome graph approach to investigate structural variants also in non-coding regions. The two main results of the study are that (1) the MTBC has a small pangenome with few accessory genes, and that (2) pangenome evolution is driven by deletions in sublineage-specific regions of difference. Combining the gene-based approach with a pangenome graph is innovative, and the former analysis is largely sound apart from a lack of information about the data set used. The graph part, however, requires more work and currently fails to support the second main result. Problems include the omission of important information and the confusing analysis of structural variants in terms of "regions of difference", which unnecessarily introduces reference bias. Overall, I very much like the direction taken in this article, but think that it needs more work: on the one hand by simply telling the reader what exactly was done, on the other by taking advantage of the information contained in the pangenome graph. 

      Thank you for your constructive feedback. We have hopefully positively addressed all your concerns. Please see our detailed responses below.

      Strengths: 

      The authors put together a large data set of long-read assemblies representing most lineages of the Mycobacterium tuberculosis context, covering a large geographic area. State-of-the-art methods are used to analyze gene presence-absence polymorphisms (Panaroo) and to construct a pangenome graph (PanGraph). Additional analysis steps are performed to address known problems with misannotated or misassembled genes in pangenome analysis. 

      Thank you for your positive feedback. We are pleased that you found these aspects of our work noteworthy and valuable.

      Weaknesses: 

      The study does not quite live up to the expectations raised in the introduction. Firstly, while the importance of using a curated data set is emphasized, little information is given about the data set apart from the geographic origin of the samples (Figure 1). A BUSCO analysis is conducted to filter for assembly quality, but no results are reported. It is also not clear whether the authors assembled genomes themselves in the cases where, according to Supplementary Table 1, only the reads were published but not the assemblies. In the end, we simply have to trust that single-contig assemblies based on long-reads are reliable. 

      The BUSCO results are present for all the genomes in Supplementary Table S1. Genome assemblies were obtained from public databases and other studies that performed the assemblies. We did not perform assemblies for any of the public datasets except the 11 genomes sequenced by ourselves, for which we included the assembly statistics. The public genomes from NCBI were marked as closed based on the NCBI pipelines so there are additional checks on quality undertaken there before we included in our analysis. Marin et al (2024; BioRxiv) also performed additional checks on the vast majority of the genomes before they were included here.  We are confident that these genomes represent the highest quality M. tuberculosis dataset possible, but we will check that all genomes are present in the GTDB list, which performs additional tests including CheckM, to add another layer of confidence. Some of the accessions to the final genomes were not included as these papers were not released yet but will be in the next version. Supplementary Table S1 will be updated to include the assembly information for each genome.

      One issue with long read assemblies could be that high rates of sequencing errors result in artificial indels when coverage is low, which in turn could affect gene annotation and pangenome inference (e.g. Watson & Warr 2019, https://doi.org/10.1038/s41587-018-0004-z). Some of the older long-read data used by the authors could well be problematic (PacBio RSII), but also their own Nanopore assemblies, six of which have a mean coverage below 50 (Wick et al. 2023 recommend 200x for ONT, https://doi.org/ 10.1371/journal.pcbi.1010905). Could the results be affected by such assembly errors? Are there lineages, for example, for which there is an increased proportion of RSII data? Given the large heterogeneity in data quality on the NCBI, I think more information about the reads and the assemblies should be provided. 

      We have shown elsewhere (Marin et al (2024; BioRxiv)) that short read sequencing is significantly worse for these types of problems. For this reason, we have included only closed genomes which we believe will reduce the potential for such errors. However, we agree that older sequencing technologies, such as PacBio RSII, can introduce errors in the assemblies and subsequent downstream analyses. We will look for correlation between platform and accessory genome presence/absence to see if the type of sequencing influences the results.

      Wick et al. (2023) recommend a coverage of 200x for ONT sequencing; however, newer analyses from Wick have shown that with modern basecalling and sequencing very low error rates can be achieved with much lower coverage (see https://rrwick.github.io/2023/10/24/ont-only-accuracy-update.html). We are quite confident that gene presence/absence patterns should be robust to this in our analysis but will confirm with some additional analysis on our sequenced genomes.

      The part of the paper I struggled most with is the pangenome graph analysis and the interpretation of structural variants in terms of "regions of difference". To start with, the method section states that "multiple whole genomes were aligned into a graph using PanGraph" (l.159/160), without stating which genomes were for what reason. From Figure 5 I understand that you included all genomes, and that Figure 6 summarizes the information at the sublineage level. This should be stated clearly, at present the reader has to figure out what was done.

      All genomes were included in the pangenome graph construction and to look for regions of differences. We then grouped genomes into sub-lineages to undertake the additional analyses as there is not enough genomes per sub-sub-lineages and lower for robust analyses. We will make this clearer in the next version, likely with a flowchart of analyses.

      It was also not clear to me why the authors focus on the sublineage level: a minority of accessory genes (107 of 506) are "specific to certain lineages or sublineages" (l. 240), so why conclude that the pangenome is "driven by sublineage-specific regions of difference", as the title states? What does "driven by" mean? Instead of cutting the phylogeny arbitrarily at the sublineage level, polymorphisms could be described more generally by their frequencies. 

      We acknowledge the importance of polymorphisms, but our study primarily aimed to investigate the presence and absence of genes/genomic regions, as highlighted in our focus on structural differences rather than SNPs (L67-69). We attempted to clarify our goal of exploring gene content variation both between and within lineages (L69) to avoid confusion.

      Our focus on the sub-lineage level addresses the gap in understanding gene content distribution beyond the broad lineage level, where previous pangenome studies have concentrated. The decision to focus on sub-lineages allows for a more detailed exploration of genetic diversity. Due to the limited number of genomes available to represent all sub-sub-lineages and lower levels of classification, we aimed to investigate gene content differences at the sub-lineage level. This decision allows for a more detailed and comprehensive exploration of gene content differences within the MTBC.

      I fully agree that pangenome graphs are the way to go and that the non-coding part of the genome deserves as much attention as the coding part, as stated in the introduction. Here, however, the analysis of the pangenome graph consists of extracting variants from the graph and blasting them against the reference genome H37Rv in order to identify genes and "regions of difference" (RDs) that are variable. It is not clear what the authors do with structural variants that yield no blast hit against H37Rv. Are they ignored? Are they included as new "regions of difference"? How many of them are there? etc. The key advantage of pangenome graphs is that they allow a reference-free, full representation of genetic variation in a sample. Here reference bias is reintroduced in the first analysis step. 

      Genomic analysis of Mycobacterium tuberculosis is H37Rv reference-centric, meaning that RDs are typically defined based on their presence or absence relative to the reference strain. Our approach comparing variants to the H37Rv reference was primarily to identify and name the known regions of differences (RDs). For structural variants that did not yield a BLAST hit against H37Rv, we assigned them as new RDs in Supplementary Table S4 to provide a reference-free approach for investigating gene content differences. Further clarifications on the definition and identification of RDs will be added.

      Along similar lines, I find the interpretation of structural variants in terms of "regions of difference" confusing, and probably many people outside the TB field will do so. For one thing, it is not clear where these RDs and their names come from. Did the authors use an annotation of RDs in the reference genome H37Rv from previously published work (e.g. Bespiatykh et al. 2021)? This is important basic information, its lack makes it difficult to judge the validity of the results. The Bespiatykh et al. study uses a large short-read data (721 strains) set to characterize diversity in RDs and specifically focuses on the sublineage-specific variants. While the authors cite the paper, it would be relevant to compare the results of the two studies in more detail. 

      Indeed the term regions of difference (RDs) is somewhat M. tuberculosis specific. These are large polymorphisms which are differentially present in clades (primarily lineages) of M. tuberculosis. Annotations and naming of these is based on Bespiatykh et al. (2021) and RDscan tool which identify RD regions based on the H37Rv genomic coordinates. We obtained the corresponding Rv locus for RD regions by matching their genomic coordinates on the H37Rv genome and confirmed the RDs using the bed file from RDscan. We have used their names where our findings overlap and any new RDs we report are not found in their data. We will ensure this is clearer in the next version.

      As far as I understand, "regions of difference" have been used in the tuberculosis field to describe structural variants relative to the reference genome H37Rv. Colloquially, regions present in H37Rv but absent in another strain have been called "deletions". Whether these polymorphisms have indeed originated through deletion or through insertion in H37Rv or its ancestors requires a comparison with additional strains. While the pangenome graph does contain this information, the authors do not attempt to categorize structural variants into insertions and deletions but simply seem to assume that "regions of difference" are deletions. This, as well as the neglect of paralogs in the "classical" pangenome analysis, puts a question mark behind their conclusion that deletion drives pangenome evolution in the MTBC. 

      The term regions of difference or RDs has traditionally been used to describe structural variants relative to the H37Rv genome, often interpreted as deletions. Consistent with our study, Bespiatykh et al. (2021) observed two types of deletions: those associated with repeat sequences or mobile genetic elements, and conserved RDs that are phylogenetically informative deletions inherited by all descendants of a strain.

      In our study, we employed a phylogenetic approach to identify deletions. If RDs are present in genomes both upstream and downstream of a phylogenetic branch but are absent in one specific branch, we interpret this as evidence of gene deletion (Figure 5B). This method was systematically applied to all RDs identified as deletions in our study; we will clarify this better in the next version.

      We acknowledge the importance of considering paralogs in pangenome analysis. While the evolution of genomes is driven by duplication, loss and transfer, we know that transfer is not a mechanism in modern MTBC evolution and we have focussed here on loss. Duplication (paralog) analysis from annotations continues to be difficult to quantify due to the difficult of reliably confirming paralogy. We have addressed the effect of different Panaroo options, including merge paralogs, on the genomic diversity and pangenome estimation of MTBC in our associated paper (Marin et al 2024). This study showed that most structural variation in Mycobacterium tuberculosis is attributed to rearrangements of existing sequences rather than novel sequence content. For example, the transposable element IS6110 accounts for a significant portion of sequence variation. This hints that paralogs are not very important in terms of gene content differences in MTBC.

      However, we will attempt to build on this by looking at Panaroo outputs without merged paralogs and looking for potentially duplicated genomic stretches in the Pangraph analyses. This will hopefully show more robustly that the MTBC diversity is primarily deletion driven.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors attempted to investigate the pangenome of MTBC by using a selection of state-of-the-art bioinformatic tools to analyse 324 complete and 11 new genomes representing all known lineages and sublineages. The aim of their work was to describe the total diversity of the MTBC and to investigate the driving evolutionary force. By using long read and hybrid approaches for genome assembly, an important attempt was made to understand why the MTBC pangenome size was reported to vary in size by previous reports. 

      Strengths: 

      A stand-out feature of this work is the inclusion of non-coding regions as opposed to only coding regions which was a focus of previous papers and analyses which investigated the MTBC pangenome. A unique feature of this work is that it highlights sublineage-specific regions of difference (RDs) that were previously unknown. Another major strength is the utilisation of long-read whole genomes sequences, in combination with short-read sequences when available. It is known that using only short reads for genome assembly has several pitfalls. The parallel approach of utilizing both Panaroo and Pangraph for pangenomic reconstruction illuminated the limitations of both tools while highlighting genomic features identified by both. This is important for any future work and perhaps alludes to the need for more MTBC-specific tools to be developed. 

      Thank you for recognising the strengths of our work.

      Weaknesses: 

      The only major weakness was the limited number of isolates from certain lineages and the over-representation others, which was also acknowledged by the authors. However, since the case is made that the MTBC has a closed pangenome, the inclusion of additional genomes would not result in the identification of any new genes. This is a strong statement without an illustration/statistical analysis to support this. 

      The language around open and closed pangenomes is difficult to convey and indeed we will improve this for the next version. We aimed to show that with a set of highly curated genomes that span the breadth of known diversity within the MTBC, we see no evidence for a large, open pangenome as has been previously suggested. We instead suggest that adding new genomes is unlikely to bring large additions to the accessory genome, therefore showing that the MTBC pangenome tends towards being closed. We will add additional visualisations such as gene accumulation plots to better support this argument.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (Inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B).

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We measured the viability of C2C12 cells after 24 hours of treatment with 15d-PGJ2 using the MTT assay and observed that the viability of cells was decreased after treatment with 15d-PGJ2 (10 µM) but not with 15d-PGJ2 (1 µM, 2 µM, 4 µM, or 5 µM) (see Fig. S2A of the updated manuscript). The results and figures of the manuscript have been updated accordingly.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death.

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ­2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears.

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15d-PGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15d-PGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation.

      The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      (1) I still think the novelty is limited by previous published findings. The authors themselves noted that the accumulation of 15d-PGJ2 in senescent cells has been reported in various cell types, including human fibroblasts, HEPG2 hepatocellular carcinoma cells, and HUVEC endothelial cells (PMCID: PMC8501892). Although the current study observed similar activation of 15d-PGJ2 in myoblasts, it appears to be additive rather than fundamentally novel. The covalent adduct of 15d-PGJ2 with Cys-184 of H-Ras was reported over 20 years ago (PMID: 12684535), and the biochemical principles of this interaction are likely universal across different cell types. The regulation of myogenesis by both HRas and 15d-PGJ2 has also been previously extensively reported (PMID: 2654809, 1714463, 17412879, 20109525, 11477074). The main conceptual novelty may lie in the connection between these points in myoblasts. But as discussed in another comment, the use of C2C12 cells as a model for senescence study is questionable due to the lack of the key regulator p16. The findings in C2C12 cells may not accurately represent physiological-relevant myoblasts. It is recommended that these findings be validated in primary myoblasts to strengthen the study's conclusions.

      This is the first study to show a molecular mechanism where activation of HRas signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of HRas inhibits the differentiation of skeletal myoblasts.

      (2) The C2C12 cell line is not an ideal model for senescence study.

      C2C12 cells are a well-established model for studying myogenesis. However, their suitability as a model for senescence studies is questionable. C2C12 cells are immortalized and do not undergo normal senescence like primary cells as C2C12 cells are known to have a deleted p16/p19 locus, a crucial regulator of senescence (PMID: 20682446). The use of C2C12 cells in published studies does not inherently validate them as a suitable senescence model. These studies may have limitations, and the appropriateness of the C2C12 model depends on the specific research goals.

      Several reports have shown that cells undergo senescence independent of p16 expression. MCF7 human breast adenocarcinoma cells have been shown to undergo DNA damage mediated and Oncogene induced senescence as seen after treatment with Doxorubicin (PMID: PMC7025418) and expression of constitutively active HRas (PMID: 17135242), despite the homozygous deletion of p16 locus (ISBN 9780124375512 Chapter 17 Table 2) by upregulation of cell cycle inhibitor protein p21. In this study, we observe an increase in the senescence markers in C2C12 cells after treatment with Doxo (Fig. 1). We also observed an increase in the markers of DNA damage-mediated senescence in MCF7 after treatment with Doxo (Data will be included in the revised manuscript). Based on these observations, we have concluded that C2C12 cells undergo senescence despite lacking the p16/p19 locus.

      In the study by Moustogiannis et al. (PMID: 33918414), they claimed to have aged C2C12 cells through multiple population doublings. However, the SA-β-gal staining in their data, which is often used to confirm senescence, showed almost fully confluent "aged" C2C12 cells. This confluent state could artificially increase SA-β-gal positivity, suggesting that these cells may not truly represent senescence. Moreover, the "aged" C2C12 cells exhibited normal proliferation, which contradicts the definition of senescence. Similar findings were reported in another study of C2C12 cells subjected to 58 population doublings (PMID: 21826704), where even at this late stage, the cells were still dividing every 2 or 3 days, similar to younger cells at early passages. More importantly, I do know how the p16 was detected in that paper since the locus was already mutated. In terms of p21, there was no difference in the proliferative C2C12 cells at day 0.

      In the study by Moiseeva et al. in 2023 (PMID: 36544018), C2C12 cells were used for senescence modeling for siRNA transfection. However, the most significant findings were obtained using primary satellite cells or confirmed with complementary data.

      In conclusion, while molecular changes observed in studies using C2C12 cells may be valid, the use of primary myoblasts is highly recommended for senescence studies due to the limitations and questionable senescence characteristics of the C2C12 cell line.

      (3) Regarding source of increased PGD in the conditioned medium, I want to emphasize that it's unclear whether the PGD or its metabolites increase in response to DNA damage or the senescence state. Thus, using a different senescent model to exclude the possibility of DNA damage-induced increase will be crucial.

      Though Senescence can be induced by several stress stimuli like DNA damage, Oncogene expression, ROS, Mitochondrial Dysfunction, etc., DNA damage remains critical for the induction of the SASP (reviewed in PMID: 20078217). Also, other models of senescence, like Oncogene Induced Senescence (reviewed in PMID: 17671427), ROS Induced Senescence (PMID: 24934860), Mitochondrial Dysfunction Associated Senescence (MiDAS) (PMID: 26686024) have shown upregulation of DNA damage-associated signaling pathways. In this study, we have explored the SASP of cells undergoing senescence upon chemotherapy drug Doxorubicin-mediated DNA damage.

      (4) Similarly for the in vivo Doxorubicin (Doxo) injection, both reviewers have raised concerns about the potential side effects of Doxo, including inflammation, DNA damage, and ROS generation. These effects could potentially confound the results of the study. The physiological significance of this study will heavily rely on the in vivo data. However, the in vivo senescence component is confounded by the side effects of Doxo.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (5) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of conditioned medium. The author took it for granted that the conditioned medium from senescent cells would inhibit myogenesis, relying on previous publications (PMID: 37468473). However, that study was conducted in the context of myotonic dystrophy type 1. To support the inhibitory effect in the current experimental settings, direct evidence is required. It would be necessary to include another control with conditioned medium from normal, proliferative C2C12 cells.

      Conditioned medium of senescent cells of several types, like senescent myoblasts in case of DM1 (PMID: 37468473), adipocytes undergoing senescence due to H2O2 treatment, Insulin Resistance, and Replicative senescence (PMID: 37321332), has been shown to inhibit the differentiation of myoblasts. Therefore, in this study, we measured the effect of prostaglandin PGD2 and its metabolites on the differentiation of myoblasts by inhibiting the biosynthesis of PGD2 in senescent myoblasts by treatment with AT-56. We inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment.

      (6) Statistical analyses problems.

      Only t-test was used throughout the study even when there are more than two groups. Please have a statistician to evaluate the replicates and statistical analyses used.

      In experiments with more than two groups, the t-test was used for column-wise comparison of the experiment samples to the control sample. Multiple sample comparisons using one-way or two-way ANOVA were avoided as experimental samples were individually compared to the control sample.

      For the 15d-PGJ2/cell concentration measurements in Figure 1F, there were only two replicates, which was provided in the supplementary table after required. Was that experiment repeated with more biological replicates?

      Additional replicates of the experiment will be included in the revised manuscript.

      For figure 1C, Fig 1F, 1G, 1J, 2C, 2E, 3A, 3E, 3F, 4D, 4E, please include each data points in bar graphs as used in Fig 1D, or at least provide how many biological replicates were used for each experiment?

      Appropriate revisions will be made in the figure legends of the revised manuscript.

      There is no error bar in a lot of control groups (Fig 2C, 2E, 3EF, 4E, S4B).

      There are no error bars for the control groups in the figures 2C, 2E, 3E, 3F, 4E, and S4B as the experimental samples of each replicate were normalized to the corresponding control sample, rendering the values for the control sample of each replicate to 1.

      For qPCR data in Figure 1C, the author responded in that the data in was plotted using 2-ΔCT instead of 2-ΔΔCT to show the variability in the expression of mRNAs isolated from animals treated with Saline. This statement does not align with the method section. Please revise.

      Appropriate revisions will be made to the method sections of the revised manuscript.

      (7) For Figure 1, the title may not be appropriate as there is insufficient data to support the inhibition of myoblast differentiation.

      Appropriate revisions will be made to the revised manuscript.

      Recommendations for the authors:

      After careful review, the editors advise you to carefully address the following concerns.

      (1) There were concerns that in the revised manuscript, the DMSO and Doxo experiments depicted in Figure 1H appeared quite homogenous despite the author's description to the contrary. This leads to concerns about the type of statistics employed and the possible low number of replicates of experiments shown in Fig. 1.

      (2) Experiments in Figure 1F, 1I, and 1J had as few as n=2 experiments. Figures 1C, 1D, 1F, 1G, and 1J, the statistics used a two-tailed student's t-test; for all other experiments, they marked N/A for statistics. Using a t-test for multi-group comparisons (as indicated in the figure legend) and relying on only 2 replicates for many experiments are not appropriate.

      Additional replicates for the experiments shown in figures 1F, 1I, and 1J have been done and the data will be revised along with updated statistical tests during the revision of the manuscript.

      (3) In several experiments, the difference between technical replicates is too high.

      Reviewer #1 (Recommendations For The Authors):

      Most of my concerns were addressed in the revised manuscript.

      We thank the reviewer for their time in reviewing the manuscript and consideration of the author’s response to their comments in during the previous round of review.

      Reviewer #2 (Recommendations For The Authors):

      Validating the findings in a primary myoblast is highly recommended for senescence studies due to the limitations and questionable senescence characteristics of the C2C12 cell line.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      Validate the finding in a different senescent model to exclude the possibility of DNA damage-response.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      For Fig 2A, add another control with a conditioned medium from normal, proliferative C2C12 cells.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      Please have a statistician to evaluate the replicates and statistical analyses used.

      We have explained the statistical tests used in the manuscript in the general comment section of the reviewer’s comments.

      For the barplots (figure 1C, Fig 1F, 1G, 1J, 2C, 2E, 3A, 3E, 3F, 4D, 4E), please include each data points, or at least provide how many biological replicates were used for each experiment.

      Appropriate revisions will be made in the figure legends of the revised manuscript.

      For Figure 1, the title may not be appropriate as there is insufficient data to support the inhibition of myoblast differentiation.

      Appropriate revisions will be made to the revised manuscript.


      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript provides useful information about the lipid metabolite 15d-PGJ2 as a potential regulator of myoblast senescence. The authors provide experimental evidence that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas. However, the manuscript is incomplete in its current form, as it lacks robust support from the data regarding the main conclusions related to senescence and technical concerns related to the senescence models used in this study.

      We are grateful to the editors and the reviewers for their time and comments in sharpening the science and the writing of the manuscript. We have attached a detailed response to emphasize that the manuscript does include robust evidence regarding the claims, which could have been missed during the review process. We have provided a better context for these points now.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors show that upon treatment with Doxorubicin (Doxo), there is an increase in senescence and inflammatory markers in the muscles. They also show these genes get upregulated in C2C12 myoblasts when treated with conditioned media or 15d-PGJ2. 15dPGJ2 induces cell death in the myoblasts, decreases proliferation (measured by cell numbers), and decreases differentiation and fusion. 15d-PGJ2 modified Cys184 of HRas, which is required for its activation as indicated by the FRET analysis with RAF RBD. They also showed that 15d-PGJ2 activates ERK signaling, but not Akt signaling, through the electrophilic center. 15d-PGJ2 inhibits Golgi localization of HRAS (only WT, not C181 or C184 mutant). They also showed that expressing the WT HRas followed by 15d-PGJ2 treatment led to a decrease in the levels of MHC mRNA and protein, and this defect is dependent on C184. This is a well-written manuscript with interesting insights into the mechanism of action of 15d-PGJ2. However, some clarification and experiments will help the paper advance the field significantly.

      Strengths:

      The data clearly shows that 15d-PGJ2 has a negative role in the myoblast cells and that it leads to modification of HRas protein. Moreover, the induction of biosynthetic enzymes in the PGD2 pathway also supports the induction of 15d-PGJ2 in Doxorubicin-treated cells. Both conditioned media experiments and the 15d-PGJ2 experiments show that 15d-PGJ2 could be the active component secreted by the senescent myoblasts.

      Weaknesses:

      The genes that are upregulated in the muscles upon injection with Doxo are also markers for inflammation. Since Doxo is also known to induce systemic inflammation, it is important to delineate these two effects (inflammatory cells vs senescent cells). The expression of beta Gal and other markers of senescence in the tissue sections will help to delineate these.

      As pointed out Doxo induces systemic inflammation along with inducing DNA damage-mediated senescence. Therefore, along with the inflammatory markers of the SASP (CXCL1/2, TNF1α, IL6, PTGS1/2, PTGDS) we also observed an increase in the mRNA levels of canonical markers of DNA damage-mediated senescence. We observed an increase in the mRNA levels of cell cycle and senescence associated proteins p16 and p21 (Fig. 1C). We also observed an increased nuclear accumulation of p21 (Fig. 1A) and increased levels of phosphorylated H2A.X in the nucleus (Fig. 1B).

      In Figure 2, where the defect in the differentiation of myoblasts upon treatment with 15d-PGJ2 is shown, most of the cells die within 48 hours at higher concentrations, making it difficult to perform the experiments. This also shows that 15d-PGJ2 was toxic to these cells. Lower concentrations show a decrease in the differentiation based on the lower number of nuclei in fibers and low expression of MyoD, MyoG, and MHC. However, it is unclear if this is due to increased cell death or defective differentiation. It would be a lot more informative if the cell count, cell division, and cell death could be plotted for these concentrations of the drug during the experiment.

      We measured the viability of C2C12 cells after 24 hours of treatment with 15d-PGJ2 using the MTT assay and observed that the viability of cells was decreased after treatment with 15d-PGJ2 (10 µM) but not with 15d-PGJ2 (1 µM, 2 µM, 4 µM, or 5 µM) (see Fig. S2A of the updated manuscript). The results and figures of the manuscript have been updated accordingly.

      Also, in the myoblast experiments, are the effects of treatment with Dox reversible?

      The treatment with Doxorubicin is irreversible as the senescent phenotype was not reversed after withdrawal of Doxorubicin, even after 20 days.

      In Figure 3, most of the experiments are done at a high concentration, which induces almost complete cell death within 48 hours.

      Figure 3 is an acute experiment for only 1 hour, at which time no cell death was observed. Specifically, we measured the phosphorylation of Erk and Akt proteins after 1 hour of treatment with 15d-PGJ2 (10 µM) during which we did not observe any cell death. 

      Even at such a high concentration of 15dPGJ2, the increase in ERK phosphorylation is minimal.

      We observe a ~30% increase in the phosphorylation of Erk proteins after treatment with 15d-PGJ2 in 0.2% serum medium compared to treatment with vehicle (DMSO). This is reproducible and significant.

      The experiment Figure 4C shows that C181 and C84 mutants of the HRas show higher levels in Golgi compared with WT. However, this could very well be due to the defect in palmitoylation rather than the modification with 15d-PGJ2.

      Our data does not suggest higher levels of C184S mutant in the Golgi compared with WT (Fig. S4A). We observed that the ratio of HRas levels in the Golgi to the HRas levels in the plasma membrane were similar in C2C12 cells expressing HRas C184S and HRas WT (Fig. S4A graph columns 1 and 5).

      Though the authors allude to the possibility that intracellular redistribution of HRas by 15d-PGJ2 requires C181 palmitoylation, the direct influence of C184 modification on C181 palmitoylation is not shown. To have a meaningful conclusion, the authors need to compare the palmitoylation and modification with 15d-PGJ2.

      Palmitoylation of HRas C181S is required for the localization of HRas at the plasma membrane. The inhibition of palmitoylation of C181, either by mutation (C181S) or treatment with protein palmitoyl transferase inhibitor (2-Bromopalmitate), results in the accumulation of HRas at Golgi(Rocks et al., 2005) (Fig. S4A). Modification of HRas at C184 by 15d-PGJ2 (Fig. 3A) could inhibit the palmitoylation of HRas at C181. However, our data does not support this hypothesis as modification of HRas WT by 15d-PGJ2 does not increase the level of HRas at the Golgi, like in the case of inhibition of cysteine palmitoylation due to C181S mutation.

      To test if the inhibition of myoblast differentiation depends on HRas, they overexpressed the HRas and mutants in the C2C12 lines. However, this experiment does not take the endogenous HRAs into consideration, especially when interpreting the C184 mutant. An appropriate experiment to test this would be to knock down or knock out HRas (or make knock-in mutations of C184) and show that the effect of 15d-PGJ2 disappears. 

      Endogenous HRas (wild type) is present in the C2C12 cells overexpressing the EGFP-tagged HRas constructs. Therefore, we only observe a partial rescue in the differentiation after 15d-PGJ2 treatment in C2C12 cells expressing the C184S mutant (Fig. 4D and E). However, since HRas is expressed under high expression CMV promoter and in the absence of other regulatory elements, the overexpressed constructs do show a dominant effect over the endogenous HRas, showing cysteine mutant dependent inhibition of differentiation of myoblasts after treatment with 15dPGJ2 (Fig. 4D and E).

      Moreover, in this specific experiment, it is difficult to interpret without a control with no HRas construct and another without the 15d-PGJ2 treatment.

      The mRNA levels of MyoD, MyoG, and MHC in C2C12 cells expressing HRas constructs after treatment with 15d-PGJ2 were normalized to the mRNA levels in C2C12 cells expressing corresponding constructs and were treated with vehicle (DMSO). mRNA levels in C2C12 cells treated with vehicle were not shown as they were normalized to 1. MHC protein levels in C2C12 cells expressing HRas constructs after 15d-PGJ2 treatment were normalized to that in C2C12 cells treated with vehicle (DMSO). Since the hypothesis to study the effect of HRas cysteine mutations on the differentiation of myoblasts after treatment with 15d-PGJ2, C2C12 cells expressing HRas WT serve as adequate control. Fig. 2 shows the effect of 15dPGJ2 on muscle differentiation when HRas was not overexpressed.

      Moreover, the overall study does not delineate the toxic effects of 15d-PGJ2 from its effect on the differentiation.

      The inhibition of differentiation in C212 cells after treatment with 15d-PGJ2 cannot be attributed to the general toxicity of 15d-PGJ2 in cells. We show that the inhibition of differentiation of myoblasts after 15d-PGJ2 depends on modification of HRas at C184 i.e. failure to modify HRas at C184 (Fig. 3A) and resultant activation (Fig. 3B) by 15d-PGJ2 rescues this inhibition of differentiation of C2C12 cells (Fig. 4D and E), dissecting the inhibition of differentiation of myoblasts by 15d-PGJ2 from general toxic effects of 15d-PGJ2 on cell physiology.

      Please note that the effect of 15d-PGJ2 on cell physiology is context-specific. On one hand, 15d-PGJ2 has been shown to exert tumor-suppressor effects by inhibiting the proliferation of ovarian cancer cells and lung adenocarcinoma cells (de Jong et al., 2011; Slanovc et al., 2024), 15d-PGJ2 also exerts pro-carcinogenic effects by induction of epithelial to mesenchymal transition in breast cancer cells MCF7 and inhibition of tumor-suppressor protein p53 in MCF7 and PC-3 cells (Choi et al., 2020; Kim et al., 2010).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Swarang and colleagues identified the lipid metabolite 15d-PGJ2 as a potential component of senescent myoblasts. They proposed that 15d-PGJ2 inhibits myoblast proliferation and differentiation by binding and regulating HRas, suggesting its potential as a target for restoring muscle homeostasis post-chemotherapy.

      Strengths:

      The regulation of HRas by 15d-PGJ2 is well controlled.

      Weaknesses:

      The novelty of the study is compromised as the activation of PGD and 15d-PGJ2, as well as the regulation of HRas and cell proliferation, have been previously reported. 

      Literature does not support this statement, and it is important to clarify this misimpression for the field as a whole. 

      Let us clarify- 

      Covalent modification of HRas by 15d-PGJ2 has been reported only twice in the literature(Luis Oliva et al., 2003; Yamamoto et al., 2011) in fibroblasts and neurons respectively. 

      Interaction between Hras and 15d-PGJ2 in skeletal muscles has not been shown before, even though both Hras and 15d-PGJ2 are shown to be key regulators of muscle homeostasis. 

      Activation of Hras by 15d-PGJ2 was reported first by Luis Oliva et al (Luis Oliva et al., 2003). However, this study does not comment on the functional implications of activation of Hras signaling. 

      Recently, our lab contributed to a study where the functional implication of activation of Hras signaling due to covalent modification by 15d-PGJ2 was shown in the maintenance of senescence phenotype (Wiley et al., 2021). 

      15d-PGJ2 was shown to inhibit the differentiation of myoblasts by Hunter et al (Hunter et al., 2001). This study hypothesized that the inhibition of myoblast differentiation is via 15d-PGJ2 mediated activation of the PPARγ signaling, the study also showed inhibition of myoblast differentiation independent of PPARγ activity, suggesting the presence of other mechanisms.

      This is the first study to show a molecular mechanism where activation of Hras signaling in skeletal myoblasts due to covalent modification by 15d-PGJ2 at C184 of Hras inhibits the differentiation of skeletal myoblasts.

      Additionally, there are major technical concerns related to the senescence models, limiting data interpretation regarding the relevance to senescent cells.

      Major concerns:

      (1) The C2C12 cell line is not an ideal model for senescence study due to its immortalized nature and lack of normal p16 expression. A more suitable myoblasts model is recommended, with a more comprehensive characterization of senescence features.

      C2C12 is a good model for DNA damage-based senescence that is used in this manuscript. Several reports in the literature have shown the induction of senescence in C2C12 cells. Moiseeva et al 2023 show induction of senescence in C2C12 cells after etoposide-mediated DNA damage. Moustogiannis et al 2021 show the induction of replicative senescence in C2C12 cells. In this study, we show that C2C12 cells undergo DNA damage-mediated senescence after treatment with Doxo. We measured the induction of senescence in C2C12 cells upon DNA damage using several physiological (Nuclear Size, Cell Size, and SA β-gal) and molecular markers (mRNA levels of p21 and SASP factors (IL6 and TGFβ), protein levels of p21) of senescence (see Fig. 1 of the updated manuscript). The results and the figures in the manuscript have been updated accordingly.

      (2) The source of increased PGD or its metabolites in the conditioned medium is unclear. Including other senescence models, such as replicative or oncogeneinduced senescence, would strengthen the study.

      Fig. 1E shows time-dependent increase in the expression of PGD2 biosynthetic enzymes in senescent C2C12 cells. Fig. 1F shows an increase in the levels of 15dPGJ2 secreted by senescent C2C12 cells in the conditioned medium. This data shows that senescent C2C12 cells are the source of PGD and its metabolites in the conditioned medium.

      Again, C2C12 is not suitable for replicative senescence due to its immortalized status.

      We and others have shown that C2C12 cells undergo senescence, and this manuscript only used DNA damage induced senescence.

      (3) In the in vivo part, it is unclear whether the increased expression of PTGS1, PTGS2, and PTGDS is due to senescence or other side effects of DOXO.

      We concur that this is a limitation of this study and the subsequent work will demonstrate the origin of prostaglandin biosynthesis after treatment with Doxo in vivo.

      (4) Figure 2A lacks an important control from non-senescent cells during the measurement of C2C12 differentiation in the presence of a conditioned medium.

      Figure 2A tests the effect of prostaglandin PGD2 and its metabolites secreted by the senescent cells on the differentiation of myoblasts. Therefore, we inhibited the synthesis of PGD2 in senescent cells by treatment with AT-56, and then collected the conditioned medium. Conditioned medium collected from senescent C2C12 cells treated with vehicle (DMSO) served as a control for the experiment, whereas differentiation of C2C12 cells without any treatment serves as a positive control.

      There is no explanation of how differentiation was quantified or how the fusion index was calculated.

      The fusion index was calculated using a published myotube analyzer software (Noë et al., 2022). Appropriate information has been added to the materials and methods section of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 3: Expand SA in "SA β-gal".

      The manuscript has been updated accordingly (See line 3).

      Line 68: HRas is highly regulated by lipid modifications.

      The manuscript has been updated accordingly (See line 67).

      Figures

      Figure S1A seemed incomplete (maybe some processing issue).

      The Figure has been updated in the revised manuscript (See Fig. S1A).

      Figure S1B-H are mislabeled.

      The figure has been updated in the revised manuscript (See Fig. S1C, D, E, and F).

      Figures S1E-H are not mentioned in the manuscript.

      The manuscript has been updated accordingly (See line 120).

      Many supplementary figures are not cited in the article.

      The manuscript has been updated accordingly. (See lines 85, 120, 123, 166, 225, 356, 364, 412, and 413)

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarify the injection method for Doxorubicin in B6J mice on line 83 (IP or IM).

      Mice were injected intraperitoneally with Doxorubicin (as mentioned in the materials and methods, see lines 83 and 794)

      (2) Address missing information in figures or figure legends.

      There is missing piece in Sup Fig 1A.

      The figure has been updated in the revised manuscript (See Fig. S1A).

      Correct labels in Sup Fig 1C and 1D.

      The figure has been updated in the revised manuscript (See Fig. S1C, D, E, and F).

      How would the authors explain the dramatic differences in the morphology of C2C12 cells treated with DOXO between bright field and SA-beta-gal staining images in Sup Fig 1B and 1C.

      The SA β-gal image after treatment with Doxo does show a flattened cell morphology. Another field of view from the same experiment has been added in the figure to show the difference in the cell morphology more prominently in the revised manuscript (See Fig. 1H).

      Provide explanations for Sup Fig 1E-1G, including the meaning of the y-axis and the blue dots and red lines.

      We have provided an explanation for the multiple reaction monitoring mass spectrometry used to measure the concentration of 15d-PGJ2 in the conditioned medium in the revised manuscript (see lines 119-130 and the legends of Fig. S1C, D, and E)

      (3) Please review the calculation of qPCR data in Figure 1C for correctness, ensuring reference samples with an average expression level of 1.

      The data in Fig. 1C was plotted using 2-ΔCT instead of 2-ΔΔCT to show the variability in the expression of mRNAs isolated from animals treated with Saline.

      (4) Please explain the calculation of 15d-PGJ2/cell concentration in Figure 1F and provide raw data for review, considering the substantial changes and small error bars. The method or result section lacks an explanation of how this calculation was performed. Additionally, there is no mention of the cell number count.

      All the raw values (concentration of 15d-PGJ2 measured using mass spec and cell numbers counted at the time of collection of conditioned medium) are provided in the supplementary table 1. The standard curve to calculate the concentration of 15dPGJ2 in the conditioned medium is shown in Fig. S1F. The cell number was counted after trypsinization using a hemocytometer on the day of collection of the conditioned medium.

      (5) Please clarify how cell number normalization and doubling time calculation were done in Fig 2B. Consider replacing the figure with a growth curve showing confluence on the y-axis for easier interpretation.

      Cells were counted every 24 hours and the normalization was done to the number of cells counted on day 0 of the treatment (to consider attaching efficiency and other cell culture parameters). Doubling time was calculated as the reciprocal of the slope of the graph of log2(normalized cell number) vs time.

    1. Author response:

      Please find below our provisional author response, outlining the revisions we plan to undertake to address the Recommendations received:

      Reviewer #1 (Recommendations For The Authors):

      (1) A set of recent advances have shown that embeddings of unsupervised/self-supervised speech models aligned to auditory responses to speech in the temporal cortex (e.g. Wav2Vec2: Millet et al NeurIPS 2022; HuBERT: Li et al. Nat Neurosci 2023; Whisper: Goldstein et al. bioRxiv 2023). These models are known to preserve a variety of speech information (phonetics, linguistic information, emotions, speaker identity, etc) and perform well in a variety of downstream tasks. These other models should be evaluated or at least discussed in the study.

      We plan to evaluate two of these other models, Wav2Vec2 and HuBERT, in the brain encoding and RSA parts.

      (2) The test statistics of the results in Fig 1c-e need to be revised. Given that logistic regression is a convex optimization problem typically converging to a global optimum, these multiple initializations of the classifier were likely not entirely independent. Consequently, the reported degrees of freedom and the effect size estimates might not accurately reflect the true variability and independence of the classifier outcomes. A more careful evaluation of these aspects is necessary to ensure the statistical robustness of the results.

      We plan to address this point to ensure the statistical robustness of our results.

      (3) In Line 198, the authors discuss the number of dimensions used in their models. To provide a comprehensive comparison, it would be informative to include direct decoding results from the original spectrograms alongside those from the VLS and LIN models. Given the vast diversity in vocal speech characteristics, it is plausible that the speaker identities might correlate with specific speech-related features also represented in both the auditory cortex and the VLS. Therefore, a clearer understanding of the original distribution of voice identities in the untransformed auditory space would be beneficial. This addition would help ascertain the extent to which transformations applied by the VLS or LIN models might be capturing or obscuring relevant auditory information.

      We plan to include direct decoding results from the original spectrograms in addition from the VLS and LIN models.

      Reviewer #2 (Recommendations For The Authors):

      We plan to address the following points raised by Reviewer #2:

      (1) English mistakes, rewordings:

      a. L31: 'in voice' > consider rewording (from a voice?).

      b. L33: consider splitting sentence (after interactions).

      c. L39: 'brain' after parentheses.

      d. L45-: certainly DNNs 'as a powerful tool' extend to audio (not just image and video) beyond their use in brain models.

      e. L52: listened to / heard.

      f. L63: use second/s consistently.

      g. L64: the reference to Figure 5D is maybe a bit confusing here in the introduction.

      h. L79-88: this section is formulated in a way that is too detailed for the introduction text (confusing to read). Consider a more general introduction to the VLS concept here and the details of this study later.

      i. L99-: again, I think the experimental details are best saved for later. It's good to provide a feel for the analysis pipeline here, but some of the details provided (number of averages, denoising, preprocessing), are anyway too unspecific to allow the reader to fully follow the analysis.

      We will correct the mistakes, apply the suggested rewordings, and clarify the points raised.

      (2) Clarification.

      • L159: what was the motivation for classifying age as a 2-class classification problem? Rather than more classes or continuous prediction? How did you choose the age split?

      • L263: Is the test of RDM correlation>0 corrected for multiple comparisons across ROIs, subjects, and models?

      • L379: 'these stimuli' - weren't the experimental stimuli different from those used to train the V/AE?

      • L443: what are 'technical issues' that prevented subject 3 from participating in 48 runs??

      • L444: participants were instructed to 'stay in the scanner'!? Do you mean 'stay still', or something?

      • L463: Hearing thresholds of 15 dB: do you mean that all had thresholds lower than 15 dB at all frequencies and at all repeated audiogram measurements?

      • L472: were the 4 category levels balanced across the dataset (in number of occurrences of each category combination)?

      • L482: the test stimuli were selected as having high energy by the amplitude envelope. It is unclear what this means (how is the envelope extracted, what feature of it is used to measure 'high energy'?)

      • L500 was the audio filtered to account for the transfer function of the Sensimetrics headphones?

      • L500: what does 'comfortable level' correspond to and was it set per session (i.e. did it vary across sessions)?

      • L526- does the normalization imply that the reconstructed spectrograms are normalized? Were the reconstructions then scaled to undo the normalization before inversion?

      • L606: does the identity GLM model the denoised betas from the first GLM or simply the BOLD data? The text indicates the latter, but I suspect the former.

      • L704: could you unpack this a bit more? It is not easy to see why you specify the summing in the objective. Shouldn't this just be the ridge objective for a given voxel/ROI? Then you could just state it in matrix notation.

      • L716: you used robust scaling for the classifications in latent space but haven't mentioned scaling here. Are we to assume that the same applies?

      • L720: Pearson correlation as a performance metric and its variance will depend on the choice of test/train split sizes. Can you show that the results generalize beyond your specific choices? Maybe the report explained variance as well to get a better idea of performance.

      • Could you specify (somewhere) the stimulus timing in a run? ISI and stimulus duration are mentioned in different places, but it would be nice to have a summary of the temporal structure of runs.

      We will clarify the points raised.

      Reviewer #3 (Recommendations For The Authors):

      We plan to address the following points raised by Reviewer #3:

      Comments:

      • Code and data are not currently available.

      • In the supplementary material, it would be beneficial to present the different analyses as boxplots, as in the main text, but with the ROIs in the left and right hemispheres separated, to better show potential hemispheric effect. Although this information is available in the Supplementary Tables, it is currently quite tedious to access it.

      • In Figure 3a, it might be beneficial to order the identities by age for each gender in order to more clearly illustrate the structure of the RDMs,

      • In Figure 3b, the variance for the correlations for the aTVA is higher than in other regions, why?

      • Please make sure that all acronyms are defined, and that they are redefined in the figure legends.

      • Gender and age are primarily encoded by different brain regions (Figure 5, pTVA vs aTVA). How does this finding compare with existing literature?

      We will upload the code and the preprocessed data; improve the supplementary material figures; Fix Figure 3 according to the Reviewer’s suggestion, and clarify the points raised.

    1. Author response:

      We thank the reviewers for their comments and will revise the manuscript to provide more comprehensive clarifications to aide readers’ understanding of behaviorMate. Additionally, we intend to take several steps which could provide further insights and improve the ease of use for new behaviorMate users: (1) to release an expanded and annotated library of existing settings and VR scene files, (2) improve the online documentation of context lists and decorators which allow behaviorMate to run custom experimental paradigms without writing code, and (3) release online API details of the JSON messaging protocol that is used between behaviorMate, the Arduinos, and the VRMate program which could be especially helpful to developers interested in expanding or modifying the system. Here we provide a few brief points of clarification to some of the concerns raised by the reviewers.

      Firstly, we clarify the system’s focus on modularity and flexibility. behaviorMate leverages the “Intranet of Things” framework to provide a low-cost platform that relies on asynchronous message passing between independent networked devices. While our current VR implementation typically involves a PC, 2 Arduinos, and an Android device per VR display, the behaviorMate GUI can be configured without editing any source code to listen on additional ports for UDP messages which will be automatically timestamped and logged. Since the current implementation of the behaviorMate GUI can be configured through the settings file to send and receive JSON-formatted messages on arbitrary ports, third-party devices could be configured to listen and respond to these messages also without editing the UI source code. More specialized responsibilities or tasks that require higher temporal precision (such as position tracking) are handled by dedicated circuits so as to not overload the general purpose one. This provides a level of encapsulation/separation of concerns since components can be optimized for performance of a single tasks—a feature that is especially desirable given resource limitations on the most common commercially available microcontrollers.

      A number of methods exist for synchronizing recording devices like microscopes or electrophysiology recordings with behaviorMate’s time-stamped logs of actuators and sensors. For example, the GPIO circuit can be configured to send sync triggers, or receive timing signals as input, alternatively a dedicated circuit could record frame start signals and relay them to the PC to be logged indecently of the GPIO (enabling a high-resolution post-hoc alignment of the time stamps). The optimal method to use varies based on the needs of the experiment. For example, if very high temporal precision is needed, such as during electrophysiology experiments, a high-speed data acquisition (DAQ) circuit to capture a fixed interval readout might be beneficial. behaviorMate could still be set up as normal to provide closed and open-loop task control at behaviorally relevant timescales alongside a DAQ circuit recording events at a consistent temporal resolution. While this would increase the relative cost of the recording setup, identical rigs for training animals could still be configured without the DAQ circuit avoiding the additional cost and complexity.

      VRMate provides the interface between Unity and behaviorMate—therefore using the two systems together mean that no Unity or C# programming is necessary. VRMate provides a prespecified set of visual cues that can be scaled in 3 dimensions and have textures applied to them, permitting a wide variety of different scenes to be displayed. All VRMate scene details are additionally logged by behaviorMate to allow for consistency checks across experiments. The VRMate project also includes “editor scripts” that provide a drag-and-drop utility in Unity Editor for developing new scenes. Since the details pertaining to specific scenes and view angle are loaded at runtime via JSON-formatted UDP messages, it is not necessary to recompile VRMate in order to use this feature. Since we send individual position updates to VRMate from the PC, any issues with clock drift would be limited to the refresh rate of the Unity program that fast enough to be perceived as instantaneous and we have thoroughly tested the timing differences between displays using high-speed cameras and found them to be negligible. While we find using 5 separate Android computers to render scenes as described an optimal solution to maximize flexibility, it would also be possible to render all scenes on a single PC to further mitigate this concern depending on experimental demands. Finally, our treadmill implementations of behaviorMate use no monitor displays, however due to the modular design of behaviorMate virtual cues could be seamlessly added by added to any such setup by a VR context to the settings files.

      One last point to mention is that while our project is not affected by the recent changes in pricing structure of the Unity project, since the compiled software does not need to be regenerated to update VR scenes, or implement new task logic since this is handled by the behaviorMate GUI. This means the current state of the VRMate program is robust to any future pricing changes or other restructuring of the Unity program and does not rely on continued support of Unity. Additionally, the solution presented in VRMate has many benefits, however, a developer could easily adapt any open-source VR Maze project to receive the UDP-based position updates from behaviorMate or develop their own novel VR solutions. We intend to update the VR section of the manuscript to make all of this information clearer in the document as well as to provide the additional online documentation in the materials linked in the supplemental information.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The present paper introduces Oscillation Component Analysis (OCA), in analogy to ICA, where source separation is underpinned by a biophysically inspired generative model. It puts the emphasis on oscillations, which is a prominent characteristic of neurophysiological data.

      Strengths:

      Overall, I find the idea of disambiguating data-driven decompositions by adding biophysical constrains useful, interesting and worth-pursuing. The model incorporates both a component modelling of oscillatory responses that is agnostic about the frequency content (e.g., doesn’t need bandpass filtering or predefinition of bands) and a component to map between sensor and latent space. I feel these elements can be useful in practice.

      Thank you for the positive evaluation!

      Weaknesses:

      Lack of empirical support: I am missing empirical justification of the advantages that are theoretically claimed in the paper. I feel the method needs to be compared to existing alternatives.

      Thank you for bringing up this important issue.  We agree that a direct performance comparison would be important to demonstrate.  We performed additional analyses to compare OCA with ICA and one easy frequency domain exploratory technique in both simulated and real human data (see Section How does OCA compare to conventional approaches? and Supporting Text: Comparison of OCA to traditional approaches in experimental EEG data).  The results of the simulated data are shown in the revised Figure 3.  Although the slow and alpha oscillations in this simulation are statistically independent under the generative model, ICA identifies components that mix these independent signals, as one would expect based on the above discussion (i.e., all components are Gaussian).  Meanwhile, OCA is able to recover distinct slow and alpha components.  We repeated this analysis in real human EEG during propofol-induced unconsciousness and found a similar result where ICA produced components that mixed slow and alpha band signals whereas OCA identified distinct oscillatory components (see Figure S4.1).

      Reviewer #1 (Recommendations For The Authors):

      Major

      Theoretical justification. About the limitation of ICA In M/EEG, lines 24-28 seem to suggest that, almost by necessity (if Gaussianity approximately holds as argued), ICA doesn’t work on these modalities. But a body of work indicates that it does work to a reasonable extent, and that it is useful in practice; see https://www.pnas.org/doi/pdf/10.1073/pnas.1112685108?download=true. How then this theoretical claim be reconciled with the empirical evidence suggesting otherwise? I am putting this as a major comment because the limitations of ICA are one of the main motivations for this work, so it needs to be well-justified.

      Thanks for bringing this forward this important point and for suggesting the reference Brookes, et al. Their work actually supports our claim. In the fifth paragraph of the discussion section, Brookes, et al. states “ICA has been used previously and extensively for artifact rejection in MEG; however, its use in identification of oscillatory signals has remained limited. This limitation is likely due to its susceptibility to interference and the fact that amplitude-modulated oscillatory signals exhibit a largely Gaussian statistical distribution (and ICA relies on non-Gaussianity in recovered sources).” For this reason, they use the Hilbert envelope as the input to the ICA procedure rather than the original time-series. These Hilbert envelopes represent the instantaneous amplitude of neural oscillatory activity, i.e., they follow the amplitude modulation of the oscillatory activity. The method does not extract any oscillatory activity or disambiguate different oscillatory sources, but only assess the connectivity pattern within pre-defined bands, i.e., how different areas of the brain are harmonized through modulation of the oscillations or vice-versa inside those pre-defined bands. The paper did not show extracted independent time signals (tICs), focusing instead on the spatial pattern that these tICs activated. In that way, their use of ICA was totally justified.  Overall, our assessment of the limitations of ICA are very well aligned with Brookes, et al. We have added the against our claim in the introduction (see page 3 line 23) and revised the discussion section to refer to this paper (see page 21 lines 426-432).

      Empirical justification. The synthetic example is good, but I’m not quite sure what to make out of the real data examples. One can see reasonable spectra in the different bands and not-soeasy to interpret spatial topologies. But the main question is how OCA compares to more standard, easier approaches. Could the authors show explicitly how the benefits that were spelled out in the introduction/discussion manifest in practice, when compared to other methods?

      Thank you for bringing up this important issue.  We agree that a direct performance comparison would be important to demonstrate. We performed additional analyses to compare OCA with ICA and one easy frequency domain exploratory technique in both simulated and real human data (see Section How does OCA compare to conventional approaches? and Supporting Text: Comparison of OCA to traditional approaches in experimental EEG data).  The results of the simulated data are shown in the revised Figure 3 in page 12. Although the slow and alpha oscillations in this simulation are statistically independent under the generative model, ICA identifies components that mix these independent signals, as one would expect based on the above discussion (i.e., all components are Gaussian).  Meanwhile, OCA is able to recover distinct slow and alpha components. We repeated this analysis in real human EEG during propofol-induced unconsciousness and found a similar result where ICA produced components that mixed slow and alpha band signals whereas OCA identified distinct oscillatory components (see Figure S4.1 in Supporting Text: Comparison of OCA to traditional approaches in experimental EEG data).

      Minor

      "a recently-described class of state-space models" -> of the three references, one is from the sixties, another from the eighties, and the last one is 21 years old. Is this really a recent idea?

      Maybe rephrase "recently-described", or else think of more recent references that bring something new?

      We have amended the wording as suggested. (See page 4, line 53)

      Lines 72-74. It might be useful to unwrap in *intuitive* terms why the elements of this vector are closely related to the real and imaginary parts of the analytic signal.

      Thanks for the helpful comment. The sentence now reads:

      “These elements of this state vector traces out two time-series that maintains an approximate π/ 2 radian phase difference and therefore are closely related to the real and imaginary parts of an analytic signal…”. (See page 5, lines 72-75)

      Also, relatedly, I don’t seem to have access to the SI which is supposed to explain this. It doesn’t show up in the BiorXiv preprint either.

      We are sorry to hear that. BiorXiv merges all the supporting information and posts them under the Supplementary Material.

      In Eq(1) should it be R(f) instead of R(2 \pi f / f_s) ?

      Thank you for catching this typo.

      As I understand from lines 182-195, the input for the method is not channels but PCA components. Since R is learned, presumably the variance of the lower-order PCs (i.e. the latest elements of the diagonal of R) will estimated to be small. This, in turn, would make the likelihood to be heavily weighed on these components (because one basically divides their contribution by their variance). Would this potentially bias the estimation towards these lower-order PCs, at the expense of higher-order PCs. In a different context, this is shown here: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008580 Maybe it would be worth commenting on this?

      We agree with reviewer’s initial observations but disagree with the assessment. Our loglikelihood calculation reweights the components appropriately to counter the weighting coming due to spatial whitening, thus negating the above-mentioned bias. The main contribution of the spatial whitening and PCA are to make the learning numerically stable, i.e., it does not encounter underflow or overflow in the iterative steps. We also note that this spatial whitening, and the PCA are also reverted at the end to obtain the spatial components and estimated noise covariance. So, as long as we use all the components with strictly positive variances, we will not bias the log-likelihood one way or other.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      The study answers the important question of whether the conformational dynamics of proteins are slaved by the motion of solvent water or are intrinsic to the polypeptide. The results from neutron scattering experiments, involving isotopic labelling, carried out on a set of four structurally different proteins are convincing, showing that protein motions are not coupled to the solvent. A strength of this work is the study of a set of proteins using spectroscopy covering a range of resolutions, however, it suffers from some scholarly shortcomings and limited discussion of results. The work is of broad interest to researchers in the fields of protein biophysics and biochemistry.

      Reply 1: We thank the editors and reviewers for the positive and encouraging comments.

      Reviewer #1 (Public Review):

      Summary:

      Zheng et al. study the 'glass' transitions that occur in proteins at ca. 200K using neutron diffraction and differential isotopic labeling (hydrogen/deuterium) of the protein and solvent. To overcome limitations in previous studies, this work is conducted in parallel with 4 proteins (myoglobin, cytochrome P450, lysozyme, and green fluorescent protein) and experiments were performed at a range of instrument time resolutions (1ns - 10ps). The author's data looks compelling, and suggests that transitions in the protein and solvent behavior are not coupled and contrary to some previous reports, the apparent water transition temperature is a 'resolution effect'; i.e. instrument response is limited. This is likely to be important in the field, as a reassessment of solvent 'slaving' and the role of the hydration shell on protein dynamics should be reassessed in light of these findings.

      Strengths:

      The use of multiple proteins and instruments with a rate of energy resolution/ timescales.

      Reply 2: We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The paper could be organised to better allow the comparison of the complete dataset collected. The extent of hydration clearly influences the protein transition temperature. The authors suggest that "water can be considered here as lubricant or plasticizer which facilitates the motion of the biomolecule." This may be the case, but the extent of hydration may also alter the protein structure.

      Reply 3: Following the reviewer’s suggestion, we studied the secondary structure content and tertiary structure of CYP protein at different hydration levels (h = 0.2 and 0.4) through molecular dynamics simulation. As shown in Table S2 and Figure S6, the extent of hydration does not alter the protein secondary structure content and overall packing. Thus, this result also suggests that water molecules have more influence on protein dynamics than on protein structure.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "Decoupling of the Onset of Anharmonicity between a Protein and Its Surface Water around 200 K" by Zheng et al. presents a neutron scattering study trying to elucidate if at the dynamical transition temperature water and protein motions are coupled. The origin of the dynamical transition temperature has been highly debated for decades, specifically its relation to hydration.

      Strengths:

      The study is rather well conducted, with a lot of effort to acquire the perdeuterated proteins, and some results are interesting.

      Reply 4: We thank the reviewer for highlighting our key findings.

      Weaknesses:

      The present work could certainly contribute some arguments, but I have the feeling that not all known facts are properly discussed.

      The points the authors should carefully discuss are the following:

      (1) Daniel et al. (10.1016/S0006-3495(98)77694-5) have shown that enzymes can be functional below the dynamical transition temperature which is at odds with some of the claims of the authors.

      Reply 5: Following the reviewer’s suggestion, we added the following paragraph into the Introduction into the revised main text.

      “Although exceptions have been reported (Biophys. J. 1998, 75, 2504.), the dynamical transition has been linked to the thermal onset of function in a number of proteins, e.g, myoglobin (Biochemistry, 1975, 14, 5355-5373), ribonuclease (Nature, 1992, 357, 423-424.), elastase ( Biochemistry, 1994, 33, 9285-9293.) and bacteriorhodopsin (PNAS, 1993, 90, 9668-9672.), all of which become inactive below the dynamical transition temperature.”

      (2) It is not as easy to say that protonated proteins in D2O reflect protein dynamics while perdeuterated proteins in H2O reflect water dynamics. A recent study by Nidriche et al. (PRX LIFE 2, 013005 (2024)) reveals that H <-> D exchange is much faster than usually assumed and has important consequences for such studies.

      Reply 6: For the sample preparation, all the H-proteins were dissolved in D2O to allow full deuterium exchange of all exchangeable hydrogen atoms and then lyophilized for 12 hours to obtain the dry sample. The lyophilized H-protein is then put into a desiccator with D2O, placed in the glove box purged with nitrogen gas, to absorb D2O till the desired hydration level, h (gram water/gram protein). In contrast, the preparation of the deuterated proteins was conducted in the opposite way. The D-proteins were dissolved in H2O to allow full hydrogen exchange of all exchangeable deuterium atoms and then lyophilized for 12 hours to obtain the dry sample. The lyophilized D-protein is then put into a desiccator with H2O to absorb H2O till the desired h. This procedure can avoid H-D exchange during experiments. We added the above methods into the revised SI.

      (3) A publication by Jasnin et al. (10.1039/b923878f) on heparin sulfate shows a resolution effect.

      Reply 7: Based on the data from Jasnin et al. (10.1039/b923878f), we found that the dynamical transition of heparin sulfate did not exhibit a strong resolution effect. Estimating the dynamical transition of mean square displacement (MSD) for nanosecond motions in all heparan sulfate samples is challenging due to the absence of data on nanosecond motion of HS-dry.

      (4) The authors should discuss the impact of the chosen q-range on their findings (see Phys. Chem. Chem. Phys., 2012, 14, 4927-4934, where the authors see a huge effect!).

      Reply 8: Following the reviewer's suggestion, we calculated Ton of H-protein in D2O in the q-range from 0.45-0.9 Å⁻¹ and 1.1-1.75 Å⁻¹. The results are summarized in Table S2 and Table S3. As shown in Tables S2-3., the q-range does not alter the Ton of proteins. We added the above results into the revised SI.

      (5) The authors underline that the dynamical transition is intrinsic to the protein. However, Cupane et al. (ref 12) have shown that it can also be found in a mixture of amino acids without any protein backbone.

      Reply 9: Following the reviewer’s suggestion, we added the following discussion into the revised main text.

      “Unfreezing of the protein structural relaxation might facilitate these conformational jumps, turning on its functionality. However, as revealed by Ref (Journal of biological physics, 2010, 36, 291-297.), the denatured form of lysozyme also exhibits a dynamical transition, similar to that seen in its folded native form. Additionally, the dynamical transition also can be found in the mixture of amino acids (Physical Review Letters, 2012, 109, 128102.). Hence, one can argue that the activation of the structural relaxation of the biomolecule above the dynamical transition temperature is a necessary but insufficient condition for the protein to function, as the latter also requires the biomolecule assuming the correctly folded 3-dimensional structure.”

      (6) The authors say that they find similar dependences from MSD. They should explain that the MSD is inversely proportional to the summed intensities squared.

      Reply 10: Following the reviewer’s suggestion, we added the estimation of mean-squared atomic displacement (MSD) in the revised SI.

      “The mean-squared atomic displacement was estimated by performing Gaussian approximation, where . The values of q used for Gaussian fitting ranges from 0.45 to 0.9 Å (Biophys. J. 2006, 91, 2573.).”

      (7) A decoupling between water dynamics and membrane dynamics has already been discussed by K. Wood, G. Zaccai et al.

      Reply 11: Following the reviewer’s suggestion, we added the discussion in revised main text. “The results from the neutron scattering experiments suggest that the dynamical transition in proteins is an intrinsic property of the biomolecule and strongly depends on the amount of water surrounding it. Such an intrinsic transition can result either from a critical phase transition, e.g., water to ice (PNAS 2007, 104, 18049-18054.; JPCB, 1999, 103, 8036-8050), or from freezing of the structural relaxation of the system beyond the equilibrium time (~100-1000 s) of the experiment, in analogy to the glass transition in polymers from rubbery state to the glass form (Philosophical Magazine, 2004, 84, 1341-1353.; Science, 1995, 267, 1939-1945.; Colloid and Polymer Science, 1995, 273, 413-420.).”

      (8) The fact that transition temperature in lipid membranes is higher when the membrane is dry is also well known (A.V. Popova, D.K. Hincha, BMC Biophys. 4, 11 (2011)).

      Reply 12: We agree with the reviewer that transition temperature in lipid membranes is higher when the membrane is dry is well known. We cited this work as reference.

      (9) The authors should mention the slope (K/min) they used for DSC and discuss the impact of it on the results.

      Reply 13: Following the reviewer’s suggestion, we added DSC measurements in revised SI. “DSC measurements were performed by using the METTLER instruments DSC3+. The sample was sealed in a pan of aluminum. An empty pan was used as a reference. All the experiments were carried out in the temperature range from 150 to 300 K with a heating rate of 1 K/min. The heating rate of DSC is the same as neutron experiments.”

      (10) In the introduction, the authors should present the different explanations forwarded for the dynamical transition.

      Reply 14: Following the reviewer’s suggestion, we added different explanations forwarded for the dynamical transition in the Introduction in revised main text.

      “The dynamical transition of protein represents a significant change in the internal mobility of proteins, which has garnered various explanations. One theory suggests it's due to the behavior of water in the hydration shell, transitioning from rigid to fluid at certain temperatures, thus influencing protein flexibility. Another theory considers the transition as an inherent property of the protein, where thermal energy allows the protein to access a wider range of conformations. ”

      Reviewer #1 (Recommendations For The Authors):

      A major strength of the work is the parallel experiments performed on each of the 4 proteins. To allow better comparison of these it would be helpful to present these combined data in relevant figures to make a side-by-side comparison easier. A summary table of Ton (and potentially TDSC) values would also be helpful.

      Reply 15: Following the reviewer’s suggestion, we summarized the Ton of proteins in Table S5 and Table S6.

      The effect of hydration on protein structure should be considered. Alterations in protein secondary and tertiary structure would be expected to alter dynamics and thus could be seen as a change in Ton.

      Reply 16: The detailed analysis and discussion are presented in Reply 3.

      No uncertainty (error) in Ton values is presented. Could these be estimated from e.g. a comparison of protein Ton values measured under identical sample conditions with different spectrometers?

      Reply 17: It would be hard to compare Ton of proteins measured with different spectrometers because different spectrometers have different energy resolutions. For example, the energy resolutions of HFBS, DNA and OSIRIS are 1 μeV, 13 μeV, 25.4 μeV and 100 μeV, respectively.

      More detail is needed to correctly describe/define the proteins used for the study - e.g. P450 is a family of enzymes, so which one was used?

      Reply 18: We used P450 from Pseudomonas putida for the study. The PDB ID is 2ZAX. We added this information in the revised SI.

      P450 and myoglobin also have heme cofactors. Were these deuterated as part of the protein preparation?

      Reply 19: The heme cofactors were deuterated as part of the protein preparation.  For D-protein, all the cell culture for E.coli is deuterated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study identifies new types of interactions between Drosophila gustatory receptor neurons (GRNs) and shows that these interactions influence sensory responses and behavior. The authors find that HCN, a hyperpolarization-activated cation channel, suppresses the activity of GRNs in which it is expressed, preventing those GRNs from depleting the sensillum potential, and thereby promoting the activity of neighboring GRNs in the same sensilla. HCN is expressed in sugar GRNs, so HCN dampens the excitation of sugar GRNs and promotes the excitation of bitter GRNs. Impairing HCN expression in sugar GRNs depletes the sensillum potential and decreases bitter responses, especially when flies are fed on a sugar-rich diet, and this leads to decreased bitter aversion in a feeding assay. The authors' conclusions are supported by genetic manipulations, electrophysiological recordings, and behavioral assays.

      Strengths:

      (1) Non-synaptic interactions between neurons that share an extracellular environment (sometimes called "ephaptic" interactions) have not been well-studied, and certainly not in the insect taste system. A major strength of this study is the new insight it provides into how these interactions can impact sensory coding and behavior.

      We appreciate the reviewer’ view that our findings may allow researchers to better understand sensory coding and behavior. However, we respectfully disagree that the SP homeostasis in Drosophila gustation we describe here pertains to ephaptic interaction. Although SP reduction was proposed as the basis of post-ephaptic hyperpolarization in Drosophila olfaction, we find that SP changes are found to be too slow to mediate the fast action of ephaptic inhibition in gustation, reported in the ref#17. We observed a slow, sweet-dependent SP depletion (Fig. 5B, revised), which takes more than one hour. The real-time change of SP was also slow even upon contact with 200-mM sucrose; this result was set aside for another manuscript in preparation. Therefore, we believe the main findings in this paper concern the homeostatic preservation of SP for the maintenance of gustatory function, not ephaptic interaction.

      (2) The authors use many different types of genetic manipulations to dissect the role of HCN in GRN function, including mutants, RNAi, overexpression, ectopic expression, and neuronal silencing. Their results convincingly show that HCN impacts the sensillum potential and has both cell-autonomous and nonautonomous effects that go in opposite directions. There are a couple of conflicting or counterintuitive results, but the authors discuss potential explanations.

      (3) Experiments comparing flies raised on different food sources suggest an explanation for why the system may have evolved the way that it did: when flies live in a sugar-rich environment, their bitter sensitivity decreases, and HCN expression in sugar GRNs helps to counteract this decrease.

      Weaknesses/Limitations:

      (1) The genetic manipulations were constitutive (e.g. Ih mutations, RNAi, or misexpression), and depleting Ih from birth could lead to compensatory effects that change the function of the neurons or sensillum. Using tools to temporally control Ih expression could help to confirm the results of this study.

      We attempted to address this point by using the tub-Gal80ts system. The result is now included as Fig. 1-figure supplement 2. At 29C, a non-permissive temperature for GAL80ts which allows GAL4-dependent expression Ih-RNAi, we observed that bGRN responses were decreased and sGRN responses were increased compared to the control maintained at 18°C, and this is in parallel with the result in Fig. 1C,D. For this experiment, we inserted “To exclude the possibility that Ih is required for normal gustatory development, we temporally controlled Ih RNAi knockdown to occur only in adulthood, which produced similar results (Fig. 1-figure supplement 2).” (~line 113).

      (2) The behavioral experiment shows a striking loss of bitter sensitivity, but it was only conducted for one bitter compound at one concentration. It is not clear how general this effect is. The same is true for some of the bitter GRN electrophysiological experiments that only tested one compound and concentration.

      We conducted additional behavioral experiments with other bitters such as lobeline and theophylline (Fig. 5-figure supplement 1), which showed sensitivity losses in Ih mutants similar to caffeine. For these results, the following is inserted at ~line 274: “These results were recapitulated with other bitters, lobeline and theophylline (Fig. 5-figure supplement 1).”

      We also added single sensillum recording data with bitters, berberine, lobeline, theophylline and umbelliferone, which yielded results similar to those obtained with caffeine (Fig. 1-figure supplement 1). This is described with the sentence at ~line 105 “Other bitter chemical compounds, berberine, lobeline, theophylline, and umbelliferone, also required Ih for normal bGRN responses (Fig. 1-figure supplement 1).”

      (3) Several experiments using the Gal4/UAS system only show the Gal4/+ control and not the UAS/+ control (or occasionally neither control). Since some of the measurements in control flies seem to vary (e.g., spiking rate), it is important to compare the experimental flies to both controls to ensure that any observed effects are in fact due to the transgene expression.

      We appreciate the reviewers for raising this point. Indeed, there was a small logical flaw with the controls. We have now included all the necessary controls for Fig. 1C-F, Fig. 2I,J, Fig. 4E, and Fig. 5D, as reviewers suggested. These experiments remained statistically significant after including the new control groups.

      (4) I was surprised that manipulations of sugar GRNs (e.g. Ih knockdown, Gr64a-f deletion, or Kir silencing) can impact the sensillum potential and bitter GRN responses even in experiments where no sugar was presented.

      We are afraid there is a misunderstanding on the early part of the paper. We suspected that the manipulations impacted bGRNs and SP due to the sweetness in the regular cornmeal food, as stated in lines 214-220 “Typically, we performed extracellular recordings on flies 4-5 days after eclosion, during which they were kept in a vial with fresh regular cornmeal food containing ~400 mM D-glucose. The presence of sweetness in the food would impose long-term stimulation of sGRNs, potentially requiring the delimitation of sGRN excitability for the homeostatic maintenance of gustatory functions. To investigate this possibility, we fed WT and Ihf03355 flies overnight with either non-sweet sorbitol alone (200 mM) or a sweet mixture of sorbitol (200 mM) + sucrose (100 mM).”

      I believe the authors are suggesting that the effects of sugar GRN activity (e.g., from consuming sugar in the fly food prior to the experiment) can have long-lasting effects, but it wasn't entirely clear if this is their primary explanation or on what timescale those long-lasting effects would occur. How much / how long of a sugar exposure do the flies need for these effects to be triggered, and how long do those effects last once sugar is removed?

      We attempted to address this point with additional experiments (Fig. 5A,B). The reduction of SP could be observed in WT and HCN-deficient mutants with similar degrees 1 hr after the flies were transferred from nonsweet sorbitol-containing vials to sweet sucrose-containing ones. Moreover, the mutants, but not WT, showed further depression of SP when the sweetness persisted in the media for 4 hrs and overnight. This long-term exposure to sweetness longer than 1 hr may simulates the feeding on the regular sweet cornmeal food. The recovery of SP was also tested by removing flies from the sweet media after overnight-long sweet exposure and placing them in sorbitol food. SPs of WT and the mutants were recovered to the similar levels 1 hr after separating the animals from sweetness, although the HCN-lacking mutants showed much lower SP right after overnight sweetness exposure. The unimpaired recovery of the mutants suggests that HCN is independent of generating transepithelial potential itself. Therefore, regardless of HCN, SP changes are not fast even in the presence of strong sweetness, and SP is much better guarded when sGRNs express HCN in a sweet environment.

      We inserted the following at ~line 260 to describe the newly added recovery experiment: “Following overnight sweet exposure, SPs of WT and Ihf03355 were recovered to similar levels after 1-hr incubation with sorbitol only food. However, it was after 4 hrs on the sorbitol food that the two lines exhibited SP levels similar to those achieved by overnight incubation with sorbitol only food (Fig. 5B). These results indicate that SP depletion by sweetness is a slow process, and that the dysregulated reduction and recovery of SPs in Ihf03355 manifest only after long-term conditioning with and without sweetness, respectively.”.

      (5) The authors mention that HCN may impact the resting potential in addition to changing the excitability of the cell through various mechanisms. It would be informative to record the resting potential and other neuronal properties, but this is very difficult for GRNs, so the current study is not able to determine exactly how HCN affects GRN activity.

      On this point, we cannot but rely on previous studies of biophysical and electrophysiological characterization on mammalian HCN channels and a heterologous expression study that revealed a robust hyperpolarization-activated cation current from Drosophila HCN channels (PMID: 15804582).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors start by showing that HCN loss-of-function mutation causes a decrease in spiking in bitter GRNs (bGRN) while leaving sweet GRN (sGRN) response in the same sensillum intact. They show that a perturbation of HCN channels in sweet-sensing neurons causes a similar decrease while increasing the response of sugar neurons. They were also able to rescue the response by exogenous expression. Ectopic expression of HCN in bitter neurons had no effect. Next, they measure the sensillum potential and find that sensillum potential is also affected by HCN channel perturbation. These findings lead them to speculate that HCN in sGRN increases sGRN spiking which in turn affects bGRNs. To test this idea that carried out multiple perturbations aimed at decreasing sGRN activity. They found that decreasing sGRN activity by either using receptor mutant or by expressing Kir (a K+ channel) in sGRN increased bGRN responses. These responses also increase the sensillum potential. Finally, they show that these changes are behaviorally relevant as conditions that increase sGRN activity decrease avoidance of bitter substances.

      Strengths:

      There is solid evidence that perturbation of sweet GRNs affects bitter GRN in the same sensillum. The measurement of transsynaptic potential and how it changes is also interesting and supports the authors' conclusion.

      Weaknesses:

      The ionic basis of how perturbation in GRN affects the transepithelial potential which in turn affects the second neuron is not clear.

      We speculate that HCN-dependent membrane potential regulation, rather than ionic composition change, is responsible for the observed SP preservation, as further discussed as an author response in the section of “Recommendations for the authors”. The transepithelial potential can be dissipated by increased conductance through receptor-linked ion channels following gustatory receptor activation in GRNs. The volume of the sensillum lymph is very small according to electron micrographs of horizontally sliced bristles (PMID: 11456419). Therefore, robust excitation of a gustatory neuron may easily deplete the extracellular potential built as a form of polarized ion concentrations across the tight junction. When the consumption is too strong and extended, the neighboring neuron, which share TEP with the activated GRN, can be negatively affected. We propose that HCN suppresses overexcitation of sGRNs by means of membrane potential stabilization. This stabilization prevents sGRNs from excessively reducing the TEP, thereby protecting the activity of neighboring bGRNs.

      Reviewer #3 (Public Review):

      Ephaptic inhibition between neurons housed in the same sensilla has been long discovered in flies, but the molecular basis underlying this inhibition is underexplored. Specifically, it remains poorly understood which receptors or channels are important for maintaining the transepithelial potential between the sensillum lymph and the hemolymph (known as the sensillum potential), and how this affects the excitability of neurons housed in the same sensilla.

      Although a reduction of sensillum potential was proposed to underlie membrane hyperpolarization of post-ephaptic olfactory neurons in Drosophila, our preliminary data (not shown due to a manuscript in preparation) and the results included in the paper (Fig. 5B) strongly suggest that SP reduction is not a requisite for ephaptic inhibition at least in GRNs. Ephaptic inhibition is expected to be instantaneous, whereas we find that SP reduction in gustation is very slow. Therefore, we would like to indicate that the findings we report in this manuscript are not directly related to ephaptic inhibition.

      Lee et al. used single-sensillum recordings (SSR) of the labellar taste sensilla to demonstrate that the HCN channel, Ih, is critical for maintaining sensillum potential in flies. Ih is expressed in sugar-sensing GRNs (sGRNs) but affects the excitability of both the sGRNs and the bitter-sensing GRNs (bGRNs) in the same sensilla. Ih mutant flies have decreased sensillum potential, and bGRNs of Ih mutant flies have a decreased response to the bitter compound caffeine. Interestingly, ectopic expression of Ih in bGRNs also increases sGRN response to sucrose, suggesting that Ih-dependent increase in sensillum potential is not specific to Ih expressed in sGRNs. The authors further demonstrated, using both SSR and behavior assays, that exposure to sugars in the food substrate is important for the Ih-dependent sensitization of bGRNs. The experiments conducted in this paper are of interest to the chemosensory field. The observation that Ih is important for the activity in bGRNs albeit expressed in sGRNs is especially fascinating and highlights the importance of non-synaptic interactions in the taste system.

      Despite the interesting results, this paper is not written in a clear and easily understandable manner. It uses poorly defined terms without much elaboration, contains sentences that are borderline unreadable even for those in the narrower chemosensory field, and many figures can clearly benefit from more labeling and explanation. It certainly needs a bit of work.

      We would like to revise the language aspect of the manuscript after finalizing the scientific revision.

      Below are the major points:

      (1) Throughout the paper, it is assumed that Ih channels are expressed in sugar-sensing GRNs but not bitter-sensing GRNs. However, both this paper and citation #17, another paper from the same lab, contain only circumstantial evidence for the expression of Ih channels in sGRNs. A simple co-expression analysis, using the Ih-T2A-GAL4 line and Gr5a-LexA/Gr66a-LexA line, all of which are available, could easily demonstrate the co-expression. Including such a figure would significantly strengthen the conclusion of this paper.

      We did conduct confocal imaging with Ih-T2A-Gal4 in combination with GRN Gal4s (ref#17 version2). The expression is very broad, including both neurons and non-neuronal cells. We observed much stronger sGRN expression than bGRN expression. But the promiscuous expression of the reporter in many cells hindered us from clearly demonstrating the void of the reporter in bGRNs. However, the functional and physiological examination of Ih-T2A-Gal4 with the neuronal modifiers such as TRPA1 and Kir2.1 in ref#17 indicates the strong and little expression of Ih in sGRNs and bGRNs, respectively. Furthermore, the RNAi kd results present another line of evidence that HCN expressed in sGRNs regulates SP and bGRN activity (Fig. 1C,D, Fig. 1-figure supplement 2). Ih-RNAi expression in bGRNs did not result in any statistically significant changes in the activities of sGRNs and bGRNs compared to controls (Fig. 1C,D, revised), advocating that Ih acts in sGRNs for the functional homeostasis of SP and GRNs, as we claim.

      (2) Throughout this paper, it is often unclear which class of labellar taste sensilla is being recorded. S-a, S-b, I-a, and I-b sensilla all have different sensitivities to bitters and sugars. Each figure should clearly indicate which sensilla is being recorded. Justification should be provided if recordings from different classes of sensilla are being pooled together for statistics.

      We mainly performed SSR (single sensillum recording) on i-type bristles as they have the simplest composition of GRNs compared to s- and L-type bristles. As single s-types also contain each of s- and bGRN, we measured SP also for s-types (Figs. 2, 3F and 4D). In case of Fig.3-figure supplement 1, L-types were tested for the relationship between water cell activity and SP. Now all the panels are labelled with the tested bristle types.

      (3) In many figures, there is a lack of critical control experiments. Examples include Figures 1C-F (lacking UAS control), Figure 2I-J (lacking UAS control), Figure 4E (lacking the UAS and GAL4 control, and it is also strange to compare Gr64f > RNAi with Gr66a > RNAi, instead of with parental GAL4 and UAS controls.), and Figure 5D (lacking UAS control). Without these critical control experiments, it is difficult to evaluate the quality of the work.

      Thank you for pointing this out. We appreciate the feedback and have addressed these concerns by including all the requested controls in the figures. Specifically, we have added the UAS controls for Figs 1C-F and 2I-J, as well as the UAS and GAL4 controls for Fig. 4E. We have also included the UAS control for Fig. 5D.

      (4) Figure 2A could benefit from more clarification about what exactly is being recorded here. The text is confusing: a considerable amount of text is spent on explaining the technical details of how SP is recorded, but very little text about what SP represents, which is critical for the readers. The authors should clarify in the text that SP is measuring the potential between the sensillar lymph, where the dendrites of GRNs are immersed, and the hemolymph. Adding a schematic figure to show that SP represents the potential between the sensillar lymph and hemolymph would be beneficial.

      SP was defined at lines 55-56 in the first paragraph of introduction, which also contains the background information for SP as a transepithelial potential. As reviewer suggested, we now also included a sentence describing SP (“SP is known as a transepithelial potential between the sensillum lymph and the hemolymph, generated by active ion transport through support cells”, line 126) and a drawing to illustrate the concept of SP (Fig. 2A), and revised the legend.

      (5) The sGRN spiking rate in Figure 4B deviates significantly from previous literature (Wang, Carlson, eLife 2022; Jiao, Montell PNAS 2007, as examples), and the response to sucrose in the control flies is not dosage-dependent, which raises questions about the quality of the data. Why are the responses to sucrose not dosage-dependent? The responses are clearly not saturated at these (10 mM to 100 mM) concentrations.

      Our recordings show different spiking frequencies from others’ work, because the frequencies are from 5-sec bins not only first 0.5 sec. This lowers the frequencies, as spikes are relatively more frequent in the beginning of the recording (Fig. 4-figure supplement 1).

      Why are the responses to sucrose not dosage-dependent? The responses are clearly not saturated at these (10 mM to 100 mM) concentrations.

      We were also puzzled with the flat dose dependence to sucrose. This result may suggest the existence of another mechanism moderating sucrose responses of sGRNs. This flat curve reappeared with other genotypes with the same concentration range (5-50 mM) in Fig. 4E. However, 1-mM sucrose produced much lower spiking frequencies (Fig. 4E), suggesting that sGRN responses are saturated at 5 mM sucrose with our recording/analysis condition.

      (6) In Figure 4C, instead of showing the average spike rate of the first five seconds and the next 5 seconds, why not show a peristimulus time histogram? It would help the readers tremendously, and it would also show how quickly the spike rate adapts to overexpression and control flies. Also, since taste responses adapt rather quickly, a 500 ms or 1 s bin would be more appropriate than a 5-second bin.

      Taste single sensillum recording starts by contacting stimulants, which bars us from recording pre-stimulus responses of GRNs. Therefore, we showed post-stimulus graphs with 1-sec bins (Fig. 4-figure supplement 1) as we reviewer suggested.

      (7) Lines 215 - 220. The authors state that the presence of sugars in the culture media would expose the GRNs to sugar constantly, without providing much evidence. What is the evidence that the GRNs are being activated constantly in flies raised with culture media containing sugars? The sensilla are not always in contact with the food.

      We agree with reviewer. We replaced “long-term stimulation of sGRNs” with “strong and frequent stimulation of sGRNs for extended period”. The word long-term may be interpreted to be constant.

      (8) Line 223. To show that bGRN spike rates in Ih mutant flies "decreased even more than WT", you need to compare the difference in spike rates between the sorbitol group and the sorbitol + sucrose group, which is not what is currently shown.

      The data were examined by ANOVA and a multiple comparison test (Dunn’s) between all the groups regardless of genotypes and conditions in the panel (all the groups sharing the y axis). Therefore, the differences were statistically examined. However, the cited expression we used read like it was about the slope or extent of the decrease. We intended to indicate the difference in the absolute values of spiking frequencies after overnight sweet exposure between the genotypes, while bGRN activities were statistically indifferent between WT and Ih mutants when they were kept only on sorbitol food. We revised it to “decreased to the level significantly lower than WT”. We also changed the graph style to effectively present the trend of changes in bGRN sensitivity with comparison between genotypes. Again, the groups were statistically examined together regardless of the genotypes and conditions.

      (9) To help readers better understand the proposed mechanisms here, including a schematic figure would be helpful. This should show where Ih is expressed, how Ih in sGRNs impacts the sensillum potential, how elevated sensillum potential increases the electrical driving force for the receptor current, and affects the excitability of the bGRNs in the same sensilla, and how exposure to sugar is proposed to affect ion homeostasis in the sensillum lymph.

      As reviewer suggested, we included two panels to show working model for gustatory homeostasis via SP maintenance by HCN (Fig. 5E,F).

      Reviewer #1 (Recommendations For The Authors):

      (1) The relationship between this paper and the authors' bioRxiv preprint posted last year is not clear. In the introduction they made it seem like this paper is a follow-up that builds on the preprint, but most or all of the experiments in this paper were already performed in the preprint. I guess the authors are planning to divide the original paper into two papers. I would suggest updating the preprint to avoid confusion.

      Thank you for the comment. We updated the preprint to be without a part of Fig.6 and entire Fig.7 along with associated texts. As reviewer pointed out, our eLife paper was spun off from the part of the preprint paper, because we feel that the two stories could confuse readers when presented together.

      (2) Have the authors considered testing responses of water GRNs? They reside in the same sensilla as sugar neurons, so are they also increased affected by Ih mutation or RNAi in sugar neurons? This would strengthen the evidence that the indirect (non-cell autonomous) effects of Ih are due to the sensillum potential and not some specific interaction between sweet and bitter cells.

      As reviewer proposed, we appraised water GRN activity in the L-type bristles of WT, Ihf03355 and a genomic rescue line for Ihf03355. Spiking responses in water GRNs were evoked by hypo-osmolarity of electrolyte (0.1 mM tricholine citrate-TCC). Interestingly, the Ih mutant showed reduced 0.1 mM TCC-provoked spiking frequencies compared to WT. This impairment was rescued by the genomic fragment containing an intact Ih locus (Figure 3-figure supplement 1A).

      Additionally, SPs in L-type bristles were reduced by Ih deficiencies but increased in Gr64af, suggesting that HCN regulates sGRNs in L-type bristles as well (Figure 3-figure supplement 1B). Again, the bristles of animals with both mutations together exhibited SPs similar to those of WT.

      Furthermore, when we conducted cDNA rescue experiments in L bristles, introduction of Ih-RF cDNA in sGRNs restored SPs, while expressing it in bGRNs did not unlike the results from the i- and s-bristles (Fig. 2K,L), likely because L-bristles lack bGRNs. These cDNA rescue and genetic interaction experiments were conducted using flies fed on fresh cornmeal food with strong sweetness, suggesting that the sweetness in the media is the likely key factor producing the genetic interaction and necessitating HCN, consistent with other results in the manuscript. Therefore, SP regulation by HCN is observed in the L-type bristles.

      Minor comments:

      Line 52: typo, "Many of"

      Thank you. Corrected

      Line 95: typo, "sensilla do an sGRN"

      Corrected

      Line 98: typo, "we observed reduced the spiking responses"

      Corrected

      Line 206: typo, "a relatively low sucrose concentrations"

      Corrected

      Line 260: "inverse relationship between the two GRNs in excitability" - I am not exactly sure what data you are referring to.

      Although alleles did not show increased sGRN activities, knockdown of Ih decreased bGRN activity but increased sGRN activity (Fig. 1C,D, Fig.1-figure supplement 2B), while suppression of sGRNs increased bGRN activity (Fig. 3). To clarify this point, we revised the phrase to “the inverse relationship between the two GRNs in excitability observed in Fig. 1C,D, Fig. 1-figure supplement 2B, and Fig. 3”.

      Methods: typo, "twenty of 3-5 days with 10 males and 10 females"

      Corrected to “Twenty flies, aged 3-5 days and consisting of 10 males and 10 females,”

      Methods: typo, "Kim's wipes" should be "Kimwipes"

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      (1) More clarification is necessary on Transepithelial potential (TEP). TEP is typically created by having pumps and tight junctions between the sensillar lymph and the hemolymph.

      We have an introduction to TEP or SP in the context of sensory functions (lines 40-57) with relevant references. The involvement of pumps and tight junction was mentioned in the same paragraph; “Glia-like support cells exhibit close physical association with sensory receptor neurons, and conduct active transcellular ion transport, which is important for the operation of sensory systems” (line 40) and “Tight junctions between support cells separate the externally facing sensillar lymph from the internal body fluid known as hemolymph” (line 53).

      It is not clear how HCN channels in one of the neurons might change the composition of the sensillum lymph. An explanation of their model of how TEP depends on HCN is necessary.

      Although the ionic composition of the sensillum lymph is a contributing factor to the sensillum potential, it is more conceptually relevant to describe our findings with the perspective of membrane potential regulation given the role of HCN in membrane potential stabilization as discussed in our manuscript.

      We speculate that HCN controls the membrane potential at rest and/or in motion to modulate sGRN activity towards saving SP despite the sweetness in the niche. We positioned our results in relation to SP in discussion; “Our results provide multiple lines of evidence that HCN suppresses HCN-expressing GRNs, thereby sustaining the activity of neighboring GRNs within the same sensilla. We propose that this modulation occurs by restricting SP consumption through HCN-dependent neuronal suppression rather than via chemical and electrical synaptic transmission.” (lines 252-255). Moreover, it is unclear whether HCN is localized to the dendrite bathed in the sensillum lymph to influence the ionic composition of the lymph. It would be very interesting to study in future whether the ionic flow through HCN channels itself is critical for the function of HCN in this context, and whether HCN is exclusively present in the dendrite to support the postulation. However, we would like to remind reviewer that Kir2.1 and HCN channels in sGRNs showed similar effects on SP and bGRNs, while they differ in Na+ conductance.

      In the initially submitted manuscript (lines 325-343), we discussed the potential mechanism by which Kir2.1 and HCN channels commonly increase SP in terms of how the membrane potential regulation in the soma can control the SP consumption in the dendrite of sGRNs.

      Another point about the TEP that needs some explanation is that these sensilla are open to the environment as tastants must flow in and are different from mechanical sensilla in that sense.

      This is a very important question regarding the general physiology of the taste sensilla, as the sensillum lymph is in contact with the external environment through the pore of the sensillum. It is indeed interesting to consider how the composition and potential of the lymph are maintained despite the relatively vast volume of food the sensilla encounter during gustation and the continuous evaporation to air between episodes of gustation. However, we believe that this question, while important, is distinct from the primary focus of our manuscript.

      Are the TEP measurements in Figure 2 under control conditions where there are no tastants?

      There is no tastant in the SP-measuring glass electrode other than the electrolyte. We apologize that we did not specify the recording electrode condition. We inserted a clause in the method; “For SP recordings, the recording electrode contained 2 mM TCC as the electrolyte, and…”

      Does the TEP change dynamically as sGRN is activated?

      SP does shift in response to sweets. Please see Fig. 5B. Also, we showed SP changes by mechanical stimuli, which depended on the mechanoreceptor, NompC (Fig. 2D-F). Mechanoreceptor neurons share the sensillum lymph with GRNs.

      (2) More clarification on the potential transduction mechanism and how TEP affects one neuron differentially. Essentially, sGRN perturbation affects sGRN activity and it affects the TEP. More explanation is needed for the potential ionic mechanism of each.

      Our results strongly suggest that HCN lowers the activity of HCN-expressing GRNs, mitigating SP consumption. This modulation is crucial because the SP serves as a driving force for neuronal activation within the sensillum. HCN is particularly necessary in sGRNs because of the flies’ sweet feeding niche, which is expected to result in frequent and strong activation of sGRNs. The SP saved by HCN-dependent delimitation of sGRNs can be used to raise the responsibility of bGRNs.

      (3) The authors refer to their own unreviewed paper (Reference 17). This paper is on a similar topic and there seems to be some overlap. Clarification on this point would be important.

      We revised the biorxiv preprint, so that the preprint version 2 does not contain the parts overlapping with this eLife paper. This eLife paper was originally part of the preprint paper, but it was separated to clarify the messages of the two stories. As we explained in Discussion (lines 276-297), HCN provides resistance to both hyperpolarization and depolarization of the membrane potential. Simply put, one paper focuses on the role of HCN in resisting hyperpolarization, while the other (this paper in eLife) focuses on resisting depolarization.

      (4) Methods are sparse. Many details on the method are necessary. For example, Sensilla recordings are being done by the tip-dip method (I assume). What does "number of experiments" mean in Figure 1? Is it the number of animals or the number of sensilla? How many trials/sensilla?

      We indicated the extracellular recording was performed by the tip-dip method; “In vivo extracellular recordings were performed by the tip-dip method as detailed previously”. We also added a statement on the number of experiments; “The number of experiments indicated in figures are the number of naïve bristles tested. The naïve bristles were from at least three different animals.”

      (5) Figure 1: I understand the author's interpretation. But if one compares WT in Figure 1A to Gr64a-IhRNAi in 1C, we can come to the conclusion that there is no change. In other words, the control in Figure 1C (grey) has a much higher response than WT. Similar conclusions can be made for other experiments. Is the WT response stable enough to make the conclusions made here?

      The genetic background of each genotype may influence GRN activity to some extent. RNAi knockdown experiments are well-known for their hypomorphic nature, and their effects should be evaluated by comparison with their parental controls such as Gal4 and UAS lines. As all reviewers pointed out, we added the results from UAS control. This effort confirms that Gr89a>Ih RNAi is statistically indifferent to UAS control as well as Gr64f-Gal4 control in bGRN spiking evoked by 2-mM caffeine, while Gr64f>Ih RNAi showed reduced bGRN responses to 2 mM caffeine compared to all the controls.

      (6) Figure 3: Why is bGRN spiking not plotted against sensillum potential to observe the dependence more directly?

      This is a very interesting suggestion. We are not, however, equipped to measure spiking and sensillum potential simultaneously. Therefore, they are independent experiments, and we treated them accordingly.

      (7) Figure 4: Why bGRN response is only affected at high caffeine concentrations is not clear.

      We were also surprised by the differences in the dose dependence results of b- and sGRNs, genetically manipulated to mis-express and over-express HCN in Fig. 4A and 4E, respectively. Each gustatory neuron likely has distinct sets of players and parameters that set its own membrane potential and excitability.

      We can think of a possibility that there might be a range of membrane potentials within which HCN does not engage. In bGRNs, the resting membrane potential may lie low within this range, so that some degrees of membrane depolarization by low concentrations of caffeine do not significantly close HCN channels, thus preventing their hyperpolarizing effects. On the other hand, the membrane potential of sGRNs may be high within this range, showing suppressive effects at all tested sucrose concentrations. However, we find this explanation is too speculative to include in the main text, while we stated in the original manuscript, “implying a complex cell-specific regulation of GRN excitability.” (line 210).

      (8) Minor:

      L98 - there is a small typo

      Corrected

      L274: "funny" !?

      “Funny” currents, denoted If, were initially observed by electrophysiologists and later attributed to HCN channels, now indicated by Ih (thus the gene name Ih in Drosophila). These currents were termed "funny" due to their unusual properties compared to other currents. For more detailed information, please refer to the cited references.

      L257: Neuropeptide seemed to be abrupt

      We attempted to discuss possible mechanisms that mediate excitability changes across GRNs beyond the mechanism by SP shifts. Neuropeptides, which are chemical neurotransmitters along with small neurotransmitters, were mentioned following the discussion on synaptic transmission to suggest alternative pathways for excitability regulation. This inclusion is meant to provide a comprehensive overview of potential mechanisms influencing GRN activity.

      Reviewer #3 (Recommendations For The Authors):

      Congratulations on your fascinating research! The results are certainly of interest to the chemosensory field. However, I suggest using academic editing services to enhance the clarity of your text and ensure that the terminology and jargon align with standard usage in the field. The current choice of words may not be consistent with commonly used terms. As it is now, the writing might not fully showcase the compelling story and the effort behind your study, and is underselling your interesting results. Proper refinement could make sure your valuable findings are appropriately recognized.

      We appreciate your comments and apologize for any difficulties reviewers faced during the review process. We are currently prioritizing the review of scientific content and plan to address language issues in a subsequent revision. It would be very helpful for future revisions if the problematic sentences or expressions could be indicated in detail after this revision. This will allow us to ensure that our terminology and expression align with standard usage in the field, and that our findings are clearly and effectively communicated.

      Minor points:

      (1) Line 110: what is Ih-RF?

      We apologize that we relied on a reference in describing the cDNA. The following clause was inserted with additional reference and the Flybase id: “(Flybase id: FBtr0290109), which previously rescued Ih deficiency in other contexts17,26 ,”  

      (2) Line 158: Gr64af mutant flies still have Gr5a and a residual response to fructose and sucrose (Slone, Amrein 2007).

      We revised the line to “is severely impaired in sucrose and glucose sensing”, since there is a substantial loss of sucrose and glucose sensing in both Gr64af from Kim et al 2018 and DGr64 from Slone et al 2007, when they were examined by the proboscis extension reflex assay. This was also confirmed in the study by Jiao et al 2009. We also deleted “sugar-ageusic” and instead describe the mutant “impaired in sucrose and glucose sensing” in Fig. 3 legend.

      (3) Lines 264-273 seem unnecessary. This paper is not about the function of HCN in mammals, and these discussions seem largely irrelevant.

      We feel that it is important to position our results within a broader context by discussing the potential implications of our findings for sensory systems of other animals. As we stated, HCN channels have been localized in mammalian sensory systems, but their roles are often not well understood. By including this discussion, we aim to highlight the relevance of our findings beyond the model organism used in our study and suggest possible areas for future research in mammalian systems.

    1. Author response:

      We would like to 1) response one comment from the public review, which is also related to the eLife assessment, and 2) give provisional author responses.

      (1) Regarding the definition of the colonization-extinction rate, the first reviewer may misunderstand it: “However, there does not need to be a temporal trend! Any warm-adapted species that colonizes a site has a positive net effect on CTI; similarly, any cold-adapted species that goes extinct contributes to thermophilization.” We here clarify the definition:

      In a single iteration of our MSOM (Multi-species occupancy model), the occupancy rate of species[n] in transect[i] from year[t-1] to year[t] is related to the colonization rate and extinction rate, and is defined as:<br /> muz[n,i,t] = z[n,i,t-1]*(1-eps[n,i,t-1]) + (1-z[n,i,t-1])*gam[n,i,t-1], (also shown in Line411 in our MS).

      If the colonization rate (gam) and extinction rate (eps) remain constant, the occupancy rate(muz) will be a constant number which is related to the state of real occupancy (0 or 1). The occupancy rate will only increase if colonization rate increases (or the extinction rate decreases). That is why we are considering the temporal trend in colonization/extinction rate.

      (2) Provisional author responses:

      We will revise and improve the manuscript according to the public reviews and mainly focus on:

      (1) clarify the general definition of habitat fragmentation in the Introduction.

      (2) provide a wider perspective about how our results can be applied to conservation biology in the Discussion.

      (3) discuss the diversity of isolation metrics for future research and provide more evidence about the link between larger areas and higher habitat diversity or heterogeneity.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The authors isolated and cultured pulmonary artery smooth muscle cells (PASMC) and pulmonary artery adventitial fibroblasts (PAAF) of the lung samples derived from the patients with idiopathic pulmonary arterial hypertension (PAH) and the healthy volunteers. They performed RNA-seq and proteomics analyses to detail the cellular communication between PASMC and PAAF, which are the main target cells of pulmonary vascular remodeling during the pathogenesis of PAH. The authors revealed that PASMC and PAAF retained their original cellular identity and acquired different states associated with the pathogenesis of PAH, respectively.

      Strengths:

      Although previous studies have shown that PASMC and PAAF cells each have an important role in the pathogenesis of PAH, there have been scarce reports focusing on the interactions between PASMC and PAAF. These findings may provide valuable information for elucidating the pathogenesis of pulmonary arterial hypertension.

      We appreciate the reviewer’s positive view of our study.

      Weaknesses:

      The results of proteome analysis using primary culture cells in this paper seem a bit insufficient to draw conclusions. In particular, the authors described "We elucidated the involvement of cellular crosstalk in regulating cell state dynamics and identified pentraxin-3 and hepatocyte growth factor as modulators of PASMC phenotypic transition orchestrated by PAAF." However, the presented data are considered limited and insufficient.

      We thank the reviewer for drawing our attention to this point and we will modify our statements and conclusions accordingly, in order to avoid making too general and broad claims.

      Reviewer #2 (Public Review):

      Summary:

      Utilizing a combination of transcriptomic and proteomic profiling as well as cellular phenotyping from source-matched PASMC and PAAFs in IPAH, this study sought to explore a molecular comparison of these cells in order to track distinct cell fate trajectories and acquisition of their IPAH-associated cellular states. The authors also aimed to identify cell-cell communication axes in order to infer mechanisms by which these two cells interact and depend upon external cues. This study will be of interest to the scientific and clinical communities of those interested in pulmonary vascular biology and disease. It also will appeal to those interested in lung and vascular development as well as multi-omic analytic procedures.

      We thank the reviewer forvery positive assessment of our study.

      Strengths:

      (1) This is one of the first studies using orthogonal sequencing and phenotyping for the characterization of source-matched neighboring mesenchymal PASMC and PAAF cells in healthy and diseased IPAH patients. This is a major strength that allows for direct comparison of neighboring cell types and the ability to address an unanswered question regarding the nature of these mesenchymal "mural" cells at a precise molecular level.

      We value the reviewer’s kind and objective summary of our study.

      (2) Unlike a number of multi-omic sequencing papers that read more as an atlas of findings without structure, the inherent comparative organization of the study and presentation of the data were valuable in aiding the reader in understanding how to discern the distinct IPAH-associated cell states. As a result, the reader not only gleans greater insight into these two interacting cell types in disease but also now can leverage these datasets more easily for future research questions in this space.

      We thank the reviewer for this highly positive comment.

      (3) There are interesting and surprising findings in the cellular characterizations, including the low proliferative state of IPAH-PASMCs as compared to the hyperproliferative state in IPAH-PAAFs. Furthermore, the cell-cell communication axes involving ECM components and soluble ligands provided by PAAFs that direct cell state dynamics of PASMCs offer some of the first and foundational descriptions of what are likely complex cellular interactions that await discovery.

      We agree with the reviewer’s assessment that some of the novel data in our study helps to formulate testable hypothesis that can be followed up in future research.

      (4) Technical rigor is quite high in the -omics methodology and in vitro phenotyping tools used.

      We are grateful for reviewer’s recognition and positive assessment of our work.

      Weaknesses:

      There are some weaknesses in the methodology that should temper the conclusions:

      (1) The number of donors sampled for PAAF/PASMCs was small for both healthy controls and IPAH patients. Thus, while the level of detail of -omics profiling was quite deep, the generalizability of their findings to all IPAH patients or Group 1 PAH patients is limited.

      We share the reviewers concerns regarding the generalizability of the findings. Indeed, the initial number of samples used for the omics study (n=4 in each group) was limited due to the unique setup of using source-matched cells from the same pulmonary artery. While we included additional samples in our phenotypic assays (n=6) which further confirmed our findings,  we will acknowledge the small number of samples in the revised manuscript as a limiting factor in drawing definite conclusions for all PAH patients.

      (2) While the study utilized early passage cells, these cells nonetheless were still cultured outside the in vivo milieu prior to analysis. Thus, while there is an assumption that these cells do not change fundamental behavior outside the body, that is not entirely proven for all transcriptional and proteomic signatures. As such, the major alterations that are noted would be more compelling if validated from tissue or cells derived directly from in vivo sources. Without such validation, the major limitation of the impact and conclusions of the paper is that the full extent of the relevance of these findings to human disease is not known.

      We thank the reviewer for this constructive and excellent suggestion. Changes induced by ex vivo culturing are a common challenge when working with primary human cells. We agree with the reviewer that the proposed comparison with the publicly available sequencing datasets utilizing fresh samples will provide the readers with sufficient information to more objectively put the findings of our study into perspective.

      (3) While the presentation of most of the manuscript was quite clear and convincing, the terminology and conclusions regarding "cell fate trajectories" throughout the manuscript did not seem to be fully justified. That is, all of the analyses were derived from cells originating from end-stage IPAH, and otherwise, the authors were not lineage tracing across disease initiation or development (which would be impossible currently in humans). So, while the description of distinct "IPAH-associated states" makes sense, any true cell fate trajectory was not clearly defined.

      In accordance with reviewer’s comment, we will more carefully choose the wording in order to better reflect our findings.

    1. Author response:

      Reviewer #1 (Public Review):

      Weaknesses:

      With the exception of the PCR analysis and the reporter assays, the manuscript does not contain any experiments or attempts to analyze current expression from any of the identified proviruses. No long-read RNASeq or other RNA analysis on cytoplasmic RNA was performed, nor any experiments to show that proteins are indeed expressed.

      We agree that an investigation of RNA and protein expression from these proviruses would be very interesting, and we hope to do such work in the future to test whether this clade is still actively infecting any primate species. However, we believe that such an investigation is out of the scope of this manuscript, which is focused on the past evolutionary history of these viruses. However, it is worth noting that we do show evidence for proviral expression at the RNA level in Fig. 6 supplement 1, showing alignment of publically available rhesus macaque iPSC RNAseq data to the SERV-K1 provirus, including both spliced and full length viral RNA. Interestingly, there appear to be reads derived from multiple proviruses, as some reads originate from proviruses with large internal deletions, while others derive from full length proviruses.

      The findings of a potential CTE are interesting, but the sequences that were appended to the reporter construct are much longer than previously identified CTEs. No data were presented to indicate whether this sequence show similarity to previously identified CTEs and no experiments to show whether this sequence functionally interacts with Nxf1, the protein shown to interact with previously identified bona fide CTEs. Also, since nucleo-cytoplasmic export was not directly analyzed, it remains possible that the sequences that were inserted into the reporter contained splice sites that would allow the RNA to be spliced "downstream" of the GFP gene, allowing the export of a "spliced" GFP mRNA.

      While it is true that the HML8-derived sequences we have tested are much longer than the canonical MPMV CTE and many other known CTEs, there are other reports of elements with CTE-like activity that are much longer and more complex than the MPMV CTE, including one, the MLV PTE, which is ~1400 nt long, even longer than the HML8-derived sequence we have identified. We have compared the MER11 sequence to known CTEs from MPMV, IAP, MusD, MLV, and RSV, as well as the woodchuck hepatitis virus WPRE, which is not a canonical CTE but has been shown to promote nuclear export of RNA; none of these sequences showed any clear sequence similarity to our sequences of interest. We have added a section discussing these questions in some detail (l. 535-547).

      Although the question of what pathway or pathways these elements co-opt is obviously of great interest, we believe it is outside the scope of this manuscript. It is worth noting that a number of cis-acting RNA transport elements do not bind NXF1, either indirectly recruiting NXF1 (IAP RTE), using CRM1 (MLV, WPRE, foamy viruses), or have an unknown mechanism (MusD). We agree that there are potential pitfalls of the reporter system used, and thus have added experiments to directly test the CTE activity of these elements, detailed above.

    1. Author response:

      Reviewer #1 (Public Review):

      This manuscript by Negi et al. investigates the effects of different ubiquitin and ubiquitin-like modifications on the stability of substrate proteins, seeking to provide mechanistic insights into known effects of these modifications on cellular protein abundance. The authors focus on comparative studies of two modifications, ubiquitin and FAT10 (a protein with two ubiquitin-like domains), on a panel of substrate proteins; prior work had established that FAT10-conjugated proteins had lower stability to proteosomal degradation than Ub-modified counterparts.

      Strengths of the work include its integration of data across diverse approaches, including molecular dynamics simulations, solution NMR spectroscopy, and in vitro and cellular stability assays. From these, the authors provide provocative mechanistic insight into the lower stability of FAT10 on its own, and in FAT10-mediated destabilization of substrate proteins in computational and experimental findings. Notably, such destabilization impacts both the tag and tagged proteins, raising some provocative questions about mechanism. The data here are generally compelling, albeit with minor concerns on presentation in parts. Conclusions from this work will be interesting to scientists in several fields, particularly those interested in cellular proteostasis and in vitro protein design / long-range communication.

      The most substantial weakness of this work from my perspective is the specificity of these destabilization effects. In particular, technical challenges of producing bona fide Ub- or FAT10-conjugated substrates with native linkages limits the ability to conduct in vitro studies on exactly the same molecules as being studied in cellular environments. Given some discussion in the manuscript about the importance of linkage location on the specificity of certain tag/substrate interactions, this raises an understandable but unfortunate caveat that needs to be considered more fully both in general and in light of data from other fields (e.g. single molecule pulling) showing site-dependence of comparable effects. I note that these concerns do not impact the caliber of the conclusions themselves, but perhaps suggest area for caution as to their potential impact at this time.

      We thank the reviewer for positive assessment. The reviewer has pointed out the caveats regarding producing Ub- and Fat10-conjugated substrate, which we have now mentioned in the discussion in page 35 line 15.

      Reviewer #2 (Public Review):

      "Plasticity of the proteasome-targeting signal Fat10 enhances substrate degradation" is a nice study where the authors have shown the differences between two protein degradation tags namely, FAT10 and ubiquitin. Even though these tags are closely related in terms of folds, they have differential efficiency in degrading the substrates covalently attached to them. The authors have utilised extensive MD simulations combined with biophysics and cell biology to show the structural dynamics these tags provide for proteasomal degradation.

      We thank the reviewer for positive assessment and suggestions to improve the manuscript quality.

    1. Author response:

      Reviewer #2 (Public Review):

      I have two significant concerns that I believe can be resolved on the timescale of review.

      1) The work identifies substantial thinning in one leaflet. Lipids expand as they thin. Given this, are there too few lipids in this leaflet (which would also indicate thinning)? I would expect their deformations depend strongly on the number-balance of lipids in each leaflet. The authors should check if thinning, and the boundary, is sensitive to inter-leaflet-lipid imbalance.

      We thank Reviewer #2 for this insight, as it led us to evaluate the leaflet tensions in our restrained 2L0J simulation. We found there was an imbalance in the leaflet packing, which we addressed with an extensive set of new simulations and new analysis aimed at generating balanced leaflets.

      See Page 6-8, Appendix Section 1, Appendix – figures 1, 2. We discuss these findings in the new Results section “Protein footprint asymmetry can lead to differential leaflet stresses” and accompanying appendix. Many of the bilayer features in the repacked simulations are consistent with our original submission, but not all. For instance, while we continue to see large tilt immediately around the amphipathic helices in the lower leaflet and little in the upper leaflet, tilts in both leaflets decay to similar values at the box edge (Appendix - figure 2). The degree of membrane pinch along the membrane-protein contact boundaries are less sensitive to the leaflet packing, as demonstrated by the surface heights (Appendix - figure 1).

      Determining the proper change in leaflet count is quite difficult. We are actively extending our continuum model to address questions of differential leaflet strain and coupled lipid tilt, which may allow us to estimate changes in leaflet-count, but this is a significant undertaking beyond the scope of this resubmission.

      2) By constraining the pore to have 2-fold symmetry, the authors remove a large entropic penalty disfavoring such a conformation, and thus presumably disfavoring the negative- gaussian-curvature it induces. For example, if the free energy surface for the fluctuations were rather flat, and only 1% of the conformations were consistent with 2-fold symmetry, the coupling to NGC may be reduced by -kT log( 1 % ), neglecting enhancement by coupling to NGC. Therefore, I predict that the coupling to NGC would be reduced further were the constraint removed.

      We agree with the reviewer that if the 2-fold states are highly disfavored for entropic or enthalpic reasons, it would directly reduce the coupling to NGC. However, we don’t know the free energy difference between these states, and it is hard to calculate them from all-atom and beyond our current scope. While our unrestrained simulations are not converged, they demonstrate that there is a wide range of orientations for the amphipathic helices that are energetically accessible (see Figure 2, Appendix Section 1, and Appendix - figure 4). Still, the DEER data from the Howard lab (Kim et al., 2015) would be better described by further symmetry-broken states with greater inter-AH distances, suggesting that such conformations are not well represented in our equilibrium ensemble.

      Reviewer #3 (Public Review):

      Helsell et al. uses atomistic molecular dynamics simulations to characterize the structural dynamics of the M2 protein together with continuum elastic models to evaluate the energetic cost of the protein-induced bilayer deformations. Using unbiased simulations (without constraints on the protein) they show that the M2 structure is dynamic and that the AH helices are mobile (though they tend to retain their secondary structure), in agreement with experimental observations. Then, using simulations in which the peptide backbone was restrained to the starting structure, they were able to quantitatively characterize the protein- induced bilayer deformations as well as the acyl chain dynamics.

      Both the atomistic simulations and the continuum-based determinations of the bilayer deformation energies are of high quality. The authors are careful to note that their unbiased simulations do not reach equilibrium, and the authors' conclusions are well supported by their results, though some issues need to be clarified.

      1) P. 7: Choice of lipid composition: POPC:POPG:Cholesterol 0.56:0.14:0.3. This lipid composition (or POPC:POPG 0.8:0.2) has been used in a number of experimental studies that the authors use as reference. It differs, however, substantially from the lipid composition of the influenza membrane (Gerl et al., J Cell Biol, 2012; Ivanova et al., ACS Infect Dis, 2015), which is enriched in cholesterol, has a 2:1 ratio of phosphatidylethanolamine to phosphatidylcholine, and almost no PG. The choice of lipid composition is unlikely to impact the authors' major conclusions, but it should be discussed briefly. As noted by Ivanova et al., the lipids of the influenza membrane are enriched in fusogenic lipids. How will that impact the authors results.

      As noted by the Reviewer, the lipid composition we explored was based on DEER studies from Kathleen Howard. While there is a lot of cholesterol in our simulations, it is lower than the lipidomics papers suggest for the viral membrane (Gerl et al., 2012; Ivanova et al., 2015). We hypothesize that further increasing cholesterol would stiffen the membrane even more and cause the energy differences we report here to become even larger – accentuating our finding. We employ 14% POPG and the Simons lab finds about 14% PS. Chemically these headgroups are similar, but the size and spontaneous curvature difference could be a concern. This is the the different intrinsic curvatures of PE versus PC. However, we have not considered spontaneous curvature in our continuum calculations, so we cannot predict how this will influence our results.

      See Appendix - figure 6. We added a new panel to this figure with continuum parameters intended to mimic a high 50 % cholesterol membrane reported for viral coats, and we show that the curvature sensing of symmetry-broken states increases as the cholesterol content increases.

      See Page 25. We added text in the Discussion concerning the difference in lipids found in the virus versus those compositions employed in experiment and here.

      2) The definition of the lipid tilt needs to be revisited. On P. 13 (in the Pdf received for review, the authors do not provide page numbers), the tilt is defined/approximated as "the angle between the presumed membrane normal (aligned with the Z axis of the box) and the vector pointing from each phospholipid's phosphate to the midpoint between the last carbon atoms of the lipid tails." This (equating the normal to the interface with the Z axis of the simulation box) may be an acceptable approximation for the lower leaflet, which is approximately flat, but probably not for the upper leaflet where the interface is curved in the vicinity of the protein. The authors should, at least, discuss the implications of their approximation in terms of their conclusion that there is little lipid tilt in the upper leaflet.

      We agree that our lipid tilt calculations are approximate since we assume the membrane normal points along the z direction. We have now restated this assumption in the Results when we start to discuss tilt. Different models define lipid tilt in different ways, but the work of Deserno defines it with respect to the bilayer mid-plane which is a shared surface for the upper and lower leaflets. Thus, tilt would be moderately impacted in both leaflets. Examining the snapshots at the top of Figure 7, we surmise that the calculated tilts in both leaflets adjacent to the protein would be slightly reduced, leaving the values at the boundary unaffected. Thus, the upper leaflet likely experiences even less tilt than calculated.

      See Page 16. We have added the discussion above to the section on lipid tilt. Also, we have added page numbers to the resubmission.

      3) P. 14, last paragraph, Figure 5 and 6: The snapshots in Figure 5 are too small to see what the authors refer to when they write "tilt their lipid tails to wrap around the helices." The authors should consider citing the work of H W. Huang, e.g., Huang et al. (PRL, 2004), who introduced the notion of curvature stress induced by antimicrobial peptides, a concept similar to what the present authors propose.

      See Page 17. We have now drawn the connection between what our simulations are showing and the earlier work by Huey Huang on antimicrobial peptides.

      See Figure 7. To make the lipid deformations easier to see, we are attaching the full-size versions of each snapshot to the figure as supplemental data.

      4) P. 17-18, Figure 7: The authors introduce the bilayer midplane, which becomes important for the determination of the deformation energy in the (unnumbered) equation on P. 17, but do not specify how it is determined. This is a non-trivial undertaking, but critical for the evaluation of the deformation energy; please add the necessary details.

      See Pages 15 and 20. In the continuum model, we define CM (the compression surface) following the work of May and colleagues (and other groups) as the areal compression weighted mean of the upper and lower surface. In the MD simulation results in Figure 6, we define leaflet thickness as the absolute difference between the interpolated leaflet hydrophobic surface (calculated using the first carbon atoms of each POPC and POPG lipid tail) and the interpolated bilayer midplane surface (calculated as the average of the upper and lower leaflet tail surfaces, each interpolated based on the last carbon atoms of each POPC and POPG lipid tail for each leaflet, respectively). These two leaflet-based definitions are different, and a more sophisticated continuum model of the upper and lower leaflet coupling would require the incorporation of lipid tilt, which we do not currently have.

      5) P. 18-19, Figure 8: The comparison of the MD and continuum membrane deformations is very informative, but the authors should discuss the implications of the increased symmetry further in terms of the estimated deformation energies. (I do not believe the authors really mean that they predicted the energies, they estimated/approximated them.)

      The Reviewer is correct, we are not predicting the energies of the actual MD generated bilayers, but rather we are estimating the energies of these shapes using a continuum-based approximation. The good agreement between the MD generated surfaces and the continuum predicted surfaces suggested that the model is capturing the underlying physics. We argued that the increased symmetry of the continuum surfaces compared to the MD surfaces was due to incomplete sampling in the MD. We were right about that. Please see revised Figure 10 with new data and some longer simulations, where the symmetry in the MD is now apparent and the match between continuum and MD is even better. Frankly, we are very pleased with these new results.

      See Page 18 and Figure 10. We have changed language throughout moving away from “predicting” to “estimating”. The new MD generated data shows much greater symmetry reflected in the starting structures, and better agreement with model predictions.

      References

      Argudo, D., Bethel, N. P., Marcoline, F. V., Wolgemuth, C. W., & Grabe, M. (2017). New Continuum Approaches for Determining Protein-Induced Membrane Deformations. Biophys J, 112(10), 2159-2172. https://doi.org/10.1016/j.bpj.2017.03.040

      Bethel, N. P., & Grabe, M. (2016). Atomistic insight into lipid translocation by a TMEM16 scramblase. Proc Natl Acad Sci U S A, 113(49), 14049-14054. https://doi.org/10.1073/pnas.1607574113

      Drabik, D., Chodaczek, G., Kraszewski, S., & Langner, M. (2020). Mechanical Properties Determination of DMPC, DPPC, DSPC, and HSPC Solid-Ordered Bilayers. Langmuir, 36(14), 3826-3835. https://doi.org/10.1021/acs.langmuir.0c00475

      Ferreira, T. M., Coreta-Gomes, F., Ollila, O. H., Moreno, M. J., Vaz, W. L., & Topgaard, D. (2013). Cholesterol and POPC segmental order parameters in lipid membranes: solid state 1H-13C NMR and MD simulation studies. Phys Chem Chem Phys, 15(6), 1976- 1989. https://doi.org/10.1039/c2cp42738a

      Gerl, M. J., Sampaio, J. L., Urban, S., Kalvodova, L., Verbavatz, J. M., Binnington, B., Lindemann, D., Lingwood, C. A., Shevchenko, A., Schroeder, C., & Simons, K. (2012). Quantitative analysis of the lipidomes of the influenza virus envelope and MDCK cell apical membrane. J Cell Biol, 196(2), 213-221. https://doi.org/10.1083/jcb.201108175

      Henriksen, J., Rowat, A. C., Brief, E., Hsueh, Y. W., Thewalt, J. L., Zuckermann, M. J., & Ipsen, J. H. (2006). Universal behavior of membranes with sterols. Biophys J, 90(5), 1639- 1649. https://doi.org/10.1529/biophysj.105.067652

      Hossein, A., & Sodt, A. J. (2023). Membraneanalysis. jl: A Julia package for analyzing molecular dynamics simulations of lipid membranes. Journal of Open Source Software, 8(87), 5380.

      Hu, M., Briguglio, J. J., & Deserno, M. (2012). Determining the Gaussian curvature modulus of lipid membranes in simulations. Biophys J, 102(6), 1403-1410. https://doi.org/10.1016/j.bpj.2012.02.013

      Ivanova, P. T., Myers, D. S., Milne, S. B., McClaren, J. L., Thomas, P. G., & Brown, H. A. (2015). Lipid composition of viral envelope of three strains of influenza virus - not all viruses are created equal. ACS Infect Dis, 1(9), 399-452. https://doi.org/10.1021/acsinfecdis.5b00040

      Kim, S. S., Upshur, M. A., Saotome, K., Sahu, I. D., McCarrick, R. M., Feix, J. B., Lorigan, G. A., & Howard, K. P. (2015). Cholesterol-Dependent Conformational Exchange of the C- Terminal Domain of the Influenza A M2 Protein. Biochemistry, 54(49), 7157-7167. https://doi.org/10.1021/acs.biochem.5b01065

      Kučerka, N., Tristram-Nagle, S., & Nagle, J. F. (2006). Structure of fully hydrated fluid phase lipid bilayers with monounsaturated chains. J Membr Biol, 208(3), 193-202.

      Latorraca, N. R., Callenberg, K. M., Boyle, J. P., & Grabe, M. (2014). Continuum approaches to understanding ion and peptide interactions with the membrane. J Membr Biol, 247(5), 395-408. https://doi.org/10.1007/s00232-014-9646-z

      Liu, J., Kaksonen, M., Drubin, D. G., & Oster, G. (2006). Endocytic vesicle scission by lipid phase boundary forces. Proc Natl Acad Sci U S A, 103(27), 10277-10282. https://doi.org/10.1073/pnas.0601045103

      Pan, J., Tristram-Nagle, S., & Nagle, J. F. (2009). Effect of cholesterol on structural and mechanical properties of membranes depends on lipid chain saturation. Phys Rev E Stat Nonlin Soft Matter Phys, 80(2 Pt 1), 021931. https://doi.org/10.1103/PhysRevE.80.021931

      Rawicz, W., Olbrich, K. C., McIntosh, T., Needham, D., & Evans, E. (2000). Effect of chain length and unsaturation on elasticity of lipid bilayers. Biophys J, 79(1), 328-339. https://doi.org/10.1016/S0006-3495(00)76295-3

      Sun, D., Peyear, T. A., Bennett, W. F. D., Andersen, O. S., Lightstone, F. C., & Ingolfsson, H. I. (2019). Molecular Mechanism for Gramicidin Dimerization and Dissociation in Bilayers of Different Thickness. Biophys J, 117(10), 1831-1844. https://doi.org/10.1016/j.bpj.2019.09.044

      Tzlil, S., Deserno, M., Gelbart, W. M., & Ben-Shaul, A. (2004). A statistical-thermodynamic model of viral budding. Biophys J, 86(4), 2037-2048. https://doi.org/10.1016/S0006- 3495(04)74265-4

      Ursell, T. S., Klug, W. S., & Phillips, R. (2009). Morphology and interaction between lipid domains. Proc Natl Acad Sci U S A, 106(32), 13301-13306. https://doi.org/10.1073/pnas.0903825106

      Veatch, S. L., & Keller, S. L. (2003). Separation of liquid phases in giant vesicles of ternary mixtures of phospholipids and cholesterol. Biophys J, 85(5), 3074-3083. https://doi.org/10.1016/S0006-3495(03)74726-2

      Venable, R. M., Brown, F. L. H., & Pastor, R. W. (2015). Mechanical properties of lipid bilayers from molecular dynamics simulation. Chem Phys Lipids, 192, 60-74. https://doi.org/10.1016/j.chemphyslip.2015.07.014

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors generated a novel transgenic mouse line OpalinP2A-Flpo-T2A-tTA2 to specifically label mature oligodendrocytes, and at the same time their embryonic origins by crossing with a progenitor cre mouse line. With this clever approach, they found that LGE/CGE-derived OLs make minimum contributions to the neocortex, whereas MGE/POA-derived OLs make a small but lasting contribution to the cortex. These findings are contradictory to the current belief that LGE/CGE-derived OPCs make a sustained contribution to cortical OLs, whereas MGE/POA-derived OPCs are completely eliminated. Thus, this study provides a revised and more comprehensive view on the embryonic origins of cortical oligodendrocytes. To specifically label mature oligodendrocytes, and at the same time their embryonic origins by crossing with a progenitor cre mouse line. With this clever approach, they found that LGE/CGE-derived OLs make minimum contributions to the neocortex, whereas MGE/POA-derived OLs make a small-but-lasting contribution to to cortex. These findings are contradictory to the current belief that LGE/CGE-derived OPCs make a sustained contribution to cortical OLs, whereas MGE/POA-derived OPCs are completely eliminated. Thus, this study has provided a revised and updated view on the embryonic origins of cortical oligodendrocytes.

      Strengths:

      The authors have generated a novel transgenic mouse line to specifically label mature differentiated oligodendrocytes, which is very useful for tracing the final destiny of mature myelinating oligodendrocytes. Also, the authors carefully compared the distribution of three progenitor cre mouse lines and suggested that Gsh-cre also labeled dorsal OLs, contrary to the previous suggestion that it only marks LGE-derived OPCs. In addition, the author also analyzed the relative contributions of OLs derived from three distinct progenitor domains in other forebrain regions (e.g. Pir, ac). Finally, the new transgenic mouse lines and established multiple combinatorial genetic models will facilitate future investigations of the developmental origins of distinct OL populations and their functional and molecular heterogeneity.

      Weaknesses:

      Since OpalinP2A-Flpo-T2A-tTA2 only labels mature oligodendrocytes but not OPCs, the authors can not suggest that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation (line 118-9). It remains possible that LGE/CGE-derived OPCs migrate into the cortex but are later eliminated.

      We are glad that the reviewer appreciates our work and are grateful for the positive comments and the constructive suggestion. We agree with the reviewer that our methodology by itself cannot suggest whether the lack of LGE/CGE-derived-OLs in the neocortex is caused by competitive postnatal elimination or not. That is why we cited a parallel work by Li et al. (ref [17] in the original manuscript; ref [19] in the revised manuscript), in which in utero electroporation (IUE) failed to label LGE-derived OL lineage cells in both embryonic and early postnatal brains. Although they did not directly explore CGE using IUE, their fate mapping results using Emx1-Cre; Nkx2.1-Cre; H2B-GFP at P0 and P10 revealed very low percentage of LGE/CGE-derived OL lineage cells. The lack of adult labeling in our study together with the lack of developmental labeling in the other study prompted us to hypothesize that the lack of LGE/CGE-derived-OLs in the neocortex is less likely caused by competitive postnatal elimination, but more likely due to limited production and/or allocation. In the revised manuscript, we have expanded the discussion to explain this point more clearly.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Cai et al use a combination of mouse transgenic lines to re-examine the question of the embryonic origin of telencephalic oligodendrocytes (OLs). Their tools include a novel Flp mouse for labelling mature oligodendrocytes and a number of pre-existing lines (some previously generated by the last author in Josh Huang's lab) that allowed combinatorial or subtractive labelling of oligodendrocytes with different origins. The conclusion is that cortically-derived OLs are the predominant OL population in the motor and somatosensory cortex and underlying corpus callosum, while the LGE/CGE generates OLs for the piriform cortex and anterior commissure rather than the cerebral cortex. Small numbers of MGE-derived OLs persist long-term in the motor, somatosensory and piriform cortex.

      Strengths:

      The strength and novelty of the manuscript lies in the elegant tools generated and used and which have the potential to elegantly and accurately resolve the issue of the contribution of different progenitor zones to telencephalic regions.

      We are glad that the reviewer appreciates our work and are grateful for the overall positive comments.

      Weaknesses:

      (1) Throughout the manuscript (with one exception, lines 76-78), the authors quantified OL densities instead of contributions to the total OL population (as a % of ASPA for example). This means that the reader is left with only a rough estimation of the different contributions.

      We thank the reviewer for this constructive suggestion. We have replaced the density quantification (Figure 2F and 3D in the original manuscript) with contributions to the total OL population (% of ASPA) (Figure 2J and 2N in the revised manuscript).

      (2) All images and quantifications have been confined to one level of the cortex and the potential of the MGE and the LGE/CGE to produce oligodendrocytes for more anterior and more posterior cortical regions remains unexplored.

      The quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. We apologize for not having stated and presented this information clearly enough, and for the confusions it may have caused. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200*) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      (3) Hence, the statement that "In summary, our findings significantly revised the canonical model of forebrain OL origins (Figure 4A) and provided a new and more comprehensive view (Figure 4B )." (lines 111, 112) is not really accurate as the findings are neither new nor comprehensive. Published manuscripts have already shown that (a) cortical OLs are mostly generated from the cortex [Tripathi et al 2011 (https://doi.org/10.1523/JNEUROSCI.6474-10.2011), Winker et al 2018 (https://doi.org/10.1523/JNEUROSCI.3392-17.2018) and Li et al (https://doi.org/10.1101/2023.12.01.569674)] and (b) MGE-derived OLs persist in the cortex [Orduz et al 2019 (https://doi.org/10.1038/s41467-019-11904-4) and Li et al 2024 (https://doi.org/10.1101/2023.12.01.569674)]. Extending the current study to different rostro-caudal regions of the cortex would greatly improve the manuscript.

      As explained in the response to comment (2), our original quantifications included different rostro-caudal regions of the cortex. In the revised manuscript, we have added more schematics and representative images in the Supplementary Figure 2 for better illustration to resolve the concern of comprehensiveness.

      We thank the reviewer for listing and summarizing highly relevant published researches along with the parallel study by Li et al. submitted to eLife. We apologize for the omission of the first two references in our original manuscripts and have cited them in appropriate places (ref [10] and ref [11] in the revised manuscript). However, we believe these works do not compromise the novelty and significance of our work for the following reasons:

      (1) Tripathi et al. 2011 (ref [10] in the revised manuscript) analyzed OL lineage cells in the corpus callosum and the spinal cord, but not in the cortex and anterior commissure. Their analysis was performed in juvenile mice (P12/13), not in adulthood. Most importantly, their analysis of ventrally derived OL lineage cells relied on lineage tracing using Gsh2Cre, which in fact also label OLs derived from Gsh2+ dorsal progenitors. In contrast, we analyzed mature OLs in the cortex, corpus callosum and anterior commissure in 2-month-old adult mice. We used intersectional and subtractive strategy to label OLs derived from dorsal, LGE/CGE and MGE/POA origins. Our strategy differentiated the two different ventral lineages (LGE/CGE vs. MGE/POA) and avoided mixed labeling of OLs from ventral and dorsal Gsh2+ progenitors.

      (2) Winkler et al. 2018 (ref [11] in the revised manuscript) analyzed OLs derived from dorsal progenitors but only quantified those in the gray matter and the white matter of somatosensory cortex. Their quantification relied on co-staining with Olig2/Sox10, and thereby included both oligodendrocyte precursors (OPCs) and OLs. In contrast, we analyzed mature OLs from three origins and quantified not only neocortical regions (Mo and SS) but also an archicortical region (Pir). Our analysis revealed that although dorsally derived OLs dominate neocortex, ventrally derived OLs, especially the LGE/CGE-derived ones, dominate piriform cortex.

      (3) Orduz et al. 2019 (ref [7] in the original manuscript and the revised manuscript) mainly focused on POA-derived OLs in the somatosensory cortex. Although they performed limited analysis on MGE/POA-derived OPCs at postnatal day 10 and 19, no quantification of MGE/POA-derived OLs was performed in terms of their density, contribution to the total OL population and spatial distribution in the cortex. In contrast, we performed systematic quantification on these aspects to demonstrate that MGE/POA-derived OLs make small but sustained contribution to cortex with a distribution pattern distinctive from those derived from the dorsal origin.

      (4) Li et al. 2024 (ref [17] in the original manuscript and [19] in the revised manuscript) is a parallel study submitted to eLife. Their and our independent discoveries nicely complemented each other. Using different sets of techniques and experiments but some shared genetic mouse models, we both found that LGE/CGE made minimum contribution to neocortical OLs. Their analysis in the prenatal and early postnatal stages together with our analysis in the adult brain painted a more comprehensive picture of cortical oligodendrogenesis. The uniqueness of our work is that we performed systematic quantification of all three origins and uncovered the differential contributions to neocortex, piriform cortex, corpus callosum and anterior commissure.

      In summary, our work developed novel strategies to faithfully trace OLs from the three different origins and performed systematic analysis in the adult brain. Our data uncovered their differential contributions to neocortex, piriform cortex and the two commissural white matter tracts, which significantly differ not only from the canonical view but also from other previous studies in aspects discussed above. We believe our discoveries did significantly revise the canonical model of forebrain OL origins and provided a new and more comprehensive view.

      Reviewer #3 (Public Review):

      In the manuscript entitled "Embryonic Origins of Forebrain Oligodendrocytes Revisited by Combinatorial Genetic Fate Mapping," Cai et al. used an intersectional/subtractional strategy to genetically fate-map the oligodendrocyte populations (OLs) generated from medial ganglionic eminence (NKX2.1+), lateral ganglionic eminences, and dorsal progenitor cells (EMX1+). Specifically, they generated an OL-expressing reporter mouse line OpalinP2A-Flpo-T2A-tTA2 and bred with region-specific neural progenitor-expressing Cre lines EMX1-Cre for dOL and NKX2.1-Cre for MPOL. They used a subtractional strategy in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line to predict the origins of OLs from lateral/caudal ganglionic eminences (LC). With their genetic tools, the authors concluded that neocortical OLs primarily consist of dOLs. Although the populations of OLs (dOLs or MP-OLs) from Emx1+ or Nkx2.1+ progenitors are largely consistent with previous findings, they observed that MP-OLs contribute minimally but persist into adulthood without elimination as in the previous report (PMID: 16388308).

      Intriguingly, by using an indirect subtraction approach, they hypothesize that both Emx1-negative and Nkx2.1-negative cells represent the progenitors from lateral/caudal ganglionic eminences (LC), and conclude that neocortical OLs are not derived from the LC region.The authors claim that Gsh2 is not exclusive to progenitor cells in the LC region (PMID: 32234482). However, Gsh2 exhibits high enrichment in the LC during early embryonic development. The presence of a small population of Gsh2-positive cells in the late embryonic cortex could originate/migrate from Gsh2-positive cells in the LC at earlier stages (PMID: 32234482). Consequently, the possibility that cortical OLs derived from Gsh2+ progenitors in LC could not be conclusively ruled out. Notably, a population of OLs migrating from the ventral to the dorsal cortical region was detected after eliminating dorsal progenitor-derived OLs (PMID: 16436615).

      The indirect subtraction data for LC progenitors drawn from the OpalinFlp-tdTOM reporter in Emx1-negative and Nkx2.1-negative cells in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line present some caveats that could influence their conclusion. The extent of activity from the two Cre lines in the OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mice remains uncertain. The OpalinFlp-tdTOM expression could occur in the presence of either Emx1Cre or Nkx2.1Cre, raising questions about the contribution of the individual Cre lines. To clarify, the authors should compare the tdTOM expression from each individual Cre line, OpalinFlp::Emx1Cre::RC::FLTG or OpalinFlp::Nkx2.1Cre::RC::FLTG, with the combined OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG mouse line. This comparison is crucial as the results from the combined Cre lines could appear similar to only one Cre line active.

      Overall, the authors provided intriguing findings regarding the origin and fate of oligodendrocytes from different progenitor cells in embryonic brain regions. However, further analysis is necessary to substantiate their conclusion about the fate of LC-derived OLs convincingly.

      We thank the reviewer for these thoughtful comments. We agree with the reviewer that the presence of Gsh2-positive cells in the late embryonic cortex by itself could not rule out the possibility that they originate/migrate from Gsh2-positive cells in the LC at earlier stages. Staining dorsal-lineage intermediate progenitors with Gsh2, or performing intersectional lineage tracing using Gsh2Cre along with a dorsal-specific Flp driver, would provide more direct evidence on this issue. Nonetheless, as our lineage tracing of LGE/CGE-derive OLs did not employ Gsh2Cre, the doubt on the identity of Gsh2+ cortical progenitors should not affect the interpretation of our data.

      Regarding the subtractional LCOL labeling strategy used in our study, we wonder if there was any misunderstanding by the reviewer. As stated in our manuscript (line 59-61) and reiterated by the reviewer, OpalinFlp::Emx1Cre::Nkx2.1Cre::RC::FLTG labels OLs derived from progenitors that express neither Emx1Cre nor Nkx2.1Cre. As these two progenitor pools do not overlap with each other, there is a purely additive effect of their actions. If there is any concern about efficiency and specificity, it would be non-adequate Cre-mediated recombinations that lead to mislabeling of dOLs or MPOLs as LCOLs (i.e., OLs derived from Emx1 or Nkx2.1-expressing progenitors were not successfully “subtracted” and thereby “wrongly” retained RFP expression). Therefore, the bona-fide LGE/CGE-derive OLs would only be fewer but not more than RFP+ LCOLs labeled by our subtractional strategy, even if any of the Cre lines did not work efficiently enough. In any case, this would not affect our conclusion that LGE/CGE-derive OLs make a minimal contribution to neocortex, as the “ground truth” contribution by LGE/CGE could only be less but not more than what we have observed using the current strategy.

      In support of our conclusion, a parallel study by Li et al. 2024 (ref [17] in the original manuscript; ref [19] in the revised manuscript) also provided independent experimental evidence that “any contribution of oligodendrocyte precursors to the developing cortex from the lateral ganglionic eminence is minimal in scope (quoted from its eLife assessment).” In addition, in their revision, they performed Gsh2 immunostaining in P0 Emx1Cre::HG-loxP mouse and found nearly all Gsh2+ cells in the cortical SVZ were derived from the Emx1+ lineage. We are glad that this additional piece of evidence further clarified the case, but still want to emphasize that the subtractional strategy we took was designed purposefully to avoid the potential uncertainty of Gsh2Cre and to more faithfully label LGE/CGE-derived OLs. Therefore, the validity of our conclusion about the fate of LC-derived OLs should be independent from the question on the identity of Gsh2+ cortical progenitors and stands well by itself.

      We hope that these explanations have adequately addressed the reviewer’s concerns. 

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      In Figures 2C, 2D, 2E and 3D, the authors should provide counts of labelled cells as a % of ASPA+ cells. This will give an accurate picture of the contribution of the different progenitor regions to OLs.

      The graphs in Figure 2F are unnecessary since they are simply repeats of C-E but re-arranged.

      We thank the reviewer for the valuable suggestions. These two recommendations are sort of related, and thereby we made the following changes. We replaced the density quantification in Figure 2F and 3D with % of ASPA (Figure 2J and 2N in the revised manuscript) to give an accurate picture of the contribution of the different progenitor regions to OLs, as suggested by the reviewer. We still retained the density counts in Figure 2C-E (Figure 2G-I in the revised manuscript). Together with quantifications of rotral-caudal and larminar distributions presented in Supplementary Figure 2, these data demonstrated that OLs from differential origins display distinct spatial distribution patterns.

      At what ages were the quantifications performed in all the figures?

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section of the revised manuscript.

      In 2D, and 3B the GFP should have been activated but the authors do not show it or quantify it presumably because GFP would flood the sections in the presence of Emx1Cre. Nevertheless, since eGFP is shown in the diagram in 2B, the authors should mention why they chose not to show it.

      We thank the reviewer for the helpful comment and the suggestion. We have modified the schematic in Figure 2B and added explanation in the figure legend (line 308-313). We also added a schematic in Supplementary Figure 1A along with images of GFP channel in Supplementary Figure 1D (line 338-350).

      All the main figures and supplementary figures are too small to see properly.

      We are sorry that there was severe compression of images in the combined manuscript file at the conversion step during the initial submission. We apologize for the compromised image quality and have re-uploaded full-size figures as individual files on BioRxiv soon after receiving the reviews. For the revised manuscript, we also take care to upload full-size figures at high resolution as individual files to ensure their quality of presentation.

      Supplementary Figure 2E is unnecessary and perhaps misleading the reader that cortical-derived OLs have a preference for the lower layers whereas the distribution may simply reflect the distribution of OLs in the cortex.

      We thank the reviewer for the helpful comment and the suggestion. We have removed this panel and replaced it with quantifications of relative laminar distributions of the total (ASPA+) OLs along with those from the three different origins (Supplementary Figure 2G in the revised manuscript). Indeed, the preference for the lower layers of dorsally-derived OLs mirrored the distribution of total OLs in the cortex, while the MGE/POA-derived OLs deviate significantly from others and exhibit higher preference towards layer 4.

      Quantification of labelled cells as a % of ASPA should also be performed in Supplementary Figure 3.

      We thank the reviewer for this suggestion. In the revised manuscript, we have included quantifications of labelled cells as % of ASPA for both OpalinFlp::Emx1Cre::Ai65 and  OpalinFlp::Nkx2.1Cre::Ai65 (Figure 2J and N). The sum of the these two data sets will be equivalent to those of OpalinFlp::Emx1Cre::Nkx2.1Cre::Ai65 shown in Supplementary Figure 3, and thereby we did not perform additional quantifications to avoid redundant efforts.

      Imaging and quantification should be extended to more posterior regions of the cortex to find out whether the contribution is different from the areas already examined.

      We thank the reviewer for the suggestion on imaging and apologize for the confusion about the range of quantification. As explained in the response to comment (2) of weakness, the quantifications were not confined to one level of the cortex but were performed in brain sections ranging from Bregma +1.94 to -2.80 mm, as shown in Supplementary Figure 2A-B in the original manuscript. In the revised manuscript, we have added relevant descriptions in the “Material and Methods” section (line 199-200) and schematics along with representative images of more anterior and more posterior cortical regions (Supplementary Figure 2A-D).

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors should provide Opalin reporter expression data across various brain regions at different developmental stages to clarify the expression pattern of the reporter.

      We appreciate the reviewer’s comment. We chose to performed all quantifications in adult mice as Opalin is a well-established marker for differentiated OLs and the recombinase-dependent reporter expression is accumulative and irreversible. If there is any non-specific labeling in any earlier developmental stage, it would be retained and manifested at the timepoint we examined as well. In another word, the fact that we did not detect any non-specific labeling in the current dataset but only confined labeling in mature OLs ensured that no non-OL labeling was present in earlier timepoint. As shown in Figure 1D-F, reporter expression activated by the Opalin driver is presented at high OL specificity in all analyzed brain regions. This is further corroborated by results from combinatorically labeled samples (Figure 2 and Supplementary Figure 2), in which only OLs but not any other cell types were labeled in all analyzed brain regions too. Following the reviewers’ suggestions, we have added representative images of more rostral and more caudal cortical regions (Supplementary Figure 2B-D), which also showed highly specific OL labeling.  

      (2) In Figure 1D, please specify the developmental stage of the mice used for staining.

      We apologize for the omission of this information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript.

      (3) The authors should clarify if the Opalin reporter expressed in OPCs and astrocytes at developmental stages of mice, such as P0, P7, and P30.

      We appreciate the reviewer’s comment, but as explained in response to comment (1), Opalin is a well-established marker for differentiated OLs which is not expressed in OPCs or astrocytes. As shown in Figure 1D-E, reporter expression is confined to CC1+ differentiated OLs with no colocalization with Sox9 (astrocyte marker). In support with this observation, only ASPA+ differentiated OLs but no OPC or astrocyte were labeled in any of the combinatorial lineage tracing samples generated using this line combined with progenitor-Cre lines. In addition to marker staining, we also did not observe any RFP+ cells with OPC or astrocyte morphology. As the recombinase-dependent reporter expression is accumulative and irreversible, the fact no non-specific labeling was observed in adult brain retrospectively proved the specificity of Oplain-Flp in earlier developmental stages.

      (4) In Figure 1E, authors should address why the efficiency of the tdTomato line is notably lower compared to that of H2B-GFP and whether the stability of reporters could impact the conclusions drawn.

      The difference in reporting efficiency is mainly caused by differences inherent to the two reporting systems. The TRE-RFP reporter is derived from Ai62, composed of a Tet response element and tdTomato inserted into the T1 TIGRE locus. The tdTomato expression is driven by tTA-TRE transcriptional activation. The HG-loxP reporter is derived from HG-Dual, composed of a CAG promoter, a frt-flanked STOP cassette, and H2B-GFP inserted into the Rosa26 locus. The H2B-GFP expression is driven by CAG promoter after Flp-mediated removal of the STOP cassette. A Flp-dependent tdTomato reporter designed in the same way as the HG-FRT reporter would have similar efficiency. In fact, the RC::FLTG reporter can be viewed as such a reporter in the absence of Cre, which did show similarly high efficiency as HG-FRT and supported efficient subtractive labeling of LGE/CGE-derived OLs. We apologize for a typo in the title of the Y-axis of the right panel in the original Figure 1F which may have caused potential misunderstanding. The “RFP+CC1+/CC1” should be “XFP+CC1/CC1”. We have corrected this mistake and revised the figure legend for clearer description of the data (Line 293-302 in the revised manuscript).

      (5) In Figure 2, please clarify the developmental stage of the mice used for staining. Authors should present the eGFP image in addition to tdTOM.

      We apologize for the omission of the age information in the original manuscript. All quantifications were performed in 2-month-old adult mice. We have added this information in the “Material and Methods” section (line 199-200) of the revised manuscript. We thank the reviewer for the suggestion on eGFP image and have presented it in supplementary Figure 1 in the revised manuscript.

      (6) in Figure 2D, authors should display the eGFP image alongside the tdTomato image. It is difficult to assess the efficiency of Emx-Cre and Nkx2.1-Cre.

      We thank the reviewer for the suggestion on eGFP image and have presented eGFP image in Supplementary Figure 1D in the revised manuscript. There are two reasons why we chose to present it in the supplementary figure instead of main figure. First, we added ASPA staining in the green channel along with quantifications of RFP cells as % of ASPA in Figure 2 in the revised manuscript, following reviewer #2’s suggestion. Second, as pointed out by reviewer #2, GFP would flood the sections in the presence of Emx1Cre and could be quite distractive if it was shown together with RFP.

      We were not entirely sure what exactly the reviewer means by “assess the efficiency of Emx-Cre and Nkx2.1-Cre”, but we believe that the quantifications of RFP cells as % of ASPA clarified the contribution of each origin to the total OLs (Figure 2J and 2N in the revised manuscript).

      (7) Figure 3 depicts the entire brain, replicating the image presented in Figure 2. It would be beneficial to consolidate Figures 2 and 3, as they showcase identical brain scans of different regions.

      We thank the reviewer for the constructive suggestion and have consolidated Figures 2 and 3 in the original manuscript into Figure 2 in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors): 

      The author has addressed all the concerns I have raised.

      I have only one minor suggestion. 

      We would argue both a gray screen and a grating are visual stimuli. ... We concur, our data only address one of many possible transitions, but it is a switch between distinct visual stimuli that is sped up by ACh. 

      Thank you for clarifying this. 

      Following my comment in the previous review, the author has revised the abstract as follows:  (Before) "Our results suggest that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, enabling faster switching between internal representations during locomotion." 

      (After) "Based on this we speculate that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, possibly enabling faster switching between internal representations during locomotion." 

      My previous comment concerned specifically the latter part, "enabling faster switching between internal representations during locomotion", and, in fact, their data fully support the first part, "acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network". Thus, I suggest the following sentence: 

      "Our results suggest that acetylcholine augments the responsiveness of layer 5 neurons to inputs from outside of the local network, possibly enabling faster switching between internal representations during locomotion." 

      Thank you for clarifying. We have changed as suggested.

      Reviewer #2 (Recommendations For The Authors): 

      I thank the authors for the clarification regarding the distribution of running speeds in the study. I do agree that 30 cm/s is indeed fast for head-fixed locomotion. My concern is that while all mice contribute to the low locomotion velocity bin, the high locomotion velocity bin is dominated by a subset of animals, since not all mice reached high locomotion speeds. Therefore, the comparison between low, intermediate and high locomotion velocities includes data from different cohorts of animals and variability across animals may confound the analysis of cholinergic axon activity. However, the manuscript is carefully worded to emphasize lack of evidence (e.g. "we found no evidence of an increase in calcium activity between low and high locomotion velocities") and I have revised my summary in the public review to reflect this. 

      I thank the authors for including the scatterplots of single neuron responses locomotion and optogenetic stimulation, which illustrate their heterogeneity. I am surprised that the axes are limited to 20% deltaF/F as visual responses recorded using GCaMP6f often exceed 100% deltaF/F . 

      There are definitely neurons with responses larger than 20% dF/F0, but it is a small fraction. There are two considerations relevant to assessing dF/F amplitudes. First, in our hands trial averaged dF/F0 responses tend to be below 30% even for the most responsive neurons (trial averaging convolves response amplitude and response reliability). The reviewer is probably thinking of single trial responses often shown as raw data that can exceed 100s of %. Second, different published variants for calculating dF/F0 can result in a spectrum of values that varies by up to a factor of 10. This is largely a consequence of the choice of F0 and preprocessing related to correcting slow drifts in signal strength (originally motivated by photobleaching). Attempting to compare dF/F0 across labs is unfortunately a futile effort in absence of standardized way of calculating it. 

      Allow me to clarify how evaluating the effects of optogenetic stimulation and locomotion without analyzing them at the level of individual neurons could result in misleading conclusions. I will use the effects of cholinergic responses on grating responses as an example but this concern applies equally to the other analyses. The manuscript reports that "in layer 2/3, optogenetic activation of cholinergic axons did not result in a detectable increase in grating onset responses (Figure 4C), while the responses of layer 5 neurons to the same stimulus increased with concurrent optogenetic activation of cholinergic axons." As the Figure R2C-D illustrates, only a minority of L2/3 neurons are excited by the grating in baseline conditions, while the vast majority are either suppressed or non-responsive. This is expected, as it is well established that visual responses in layer 2/3 are sparse. If responses of the small subset of L2/3 neurons that are activated by the grating were enhanced, it may not be apparent in the population average presented in the manuscript. In contrast, since a larger fraction of L5 neurons is excited by the grating, enhancement of grating responses may be easier to detect. In other words, the effects of optogenetic stimulation may be to boost the responses of those neurons that are activated by the grating and the difference between L2/3 and L5 lies simply in the proportion of activated neurons. I do not mean to argue in favour of this specific scenario but simply present it so as to illustrate the way in which considering population averages alone may be misleading. 

      While the authors state in their response that "all relevant and clear conclusions are already captured by the mean differences shown in Figure 4", the evidence supporting this statement is not presented in the manuscript. Most importantly, it is essential to determine whether the neurons that show significant activation in response to gratings (Figure 4C-D), mismatch (Figure 4E-F) or locomotion (Figure 4G-H), are affected by optogenetic stimulation in the same way as the population average. 

      We have added the analysis suggested as Figure S6. Consistent with the population averages, even within the subset of layer 2/3 neurons most responsive to specific inputs, we found no detectable increase in responsiveness upon optogenetic stimulation of cholinergic axons.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Overall, the manuscript is very well written, the approaches used are clever, and the data were thoroughly analyzed. The study conveyed important information for understanding the circuit mechanism that shapes grid cell activity. It is important not only for the field of MEC and grid cells, but also for broader fields of continuous attractor networks and neural circuits.

      We appreciate the positive comments.

      (1) The study largely relies on the fact that ramp-like wide-field optogenetic stimulation and focal optogenetic activation both drove asynchronous action potentials in SCs, and therefore, if a pair of PV+ INs exhibited correlated activity, they should receive common inputs. However, it is unclear what criteria/thresholds were used to determine the level of activity asynchronization, and under these criteria, what percentage of cells actually showed synchronized or less asynchronized activity. A notable percentage of synchronized or less asynchronized SCs could complicate the results, i.e., PV+ INs with correlated activity could receive inputs from different SCs (different inputs), which had synchronized activity. More detailed information/statistics about the asynchronization of SC activity is necessary for interpreting the results.

      The short answer here is that spiking responses from the pairs of SCs that we sampled appear asynchronous. We now show this in the form of cross-correlograms for all recorded pairs of SCs (Figure 2, Figure Supplement 1). The correlograms lack peaks that would indicate synchronous activation. Thus, while our dataset is not large enough to rule out occasional direct synchronisation of SCs, this appears unlikely to account for synchronised input to PV+INs.

      This conclusion is consistent with consideration of mechanisms that could in principle synchronise SCs:

      First, if responses to ramping light inputs was fully deterministic, then this could lead to fixed relative timing of spikes fired by different SCs. This is unlikely given the influence of stochastic channel gating on SC spiking (Dudman and Nolan 2009) and is inconsistent with trial to trial variability in spike timing (Figure 2, Figure Supplement 2).

      Second, as SCs are glutamatergic they could excite one another. However, excitatory connections between stellate cells are rare (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016) and when detected they have low amplitude (mean < 0.25 mV; (Winterer et al. 2017)). Our finding that spiking by pairs of SCs is not correlated is consistent with this.

      Third, strong interaction between stellate cells mediated by local inhibitory pathways (Pastoll et al. 2013; Couey et al. 2013) could coordinate their activity. The lack of correlation between spiking of pairs of SCs suggests that such coordination is rarely recruited by our ramping protocols. Nevertheless, recruitment of inhibition may happen to some extent as experiments in Figure 4 show that correlated input from SCs to more distant, but not nearby PV+INs, is reduced by blocking inhibitory synapses. Given that we don't find evidence for synchronised spiking of SCs, this additional common input to widely separated PV+INs is instead best explained by recruitment of interneurons that act directly on the target SCs. We have modified Figure 8 to make this clear.

      Thus, for experiments with ramping light stimuli, synchronous activation of SCs is unlikely to explain common input to PV+INs. Input from the same SC best explains correlated responses of nearby PV+IN inhibitory populations, while recruitment of an additional inhibitory pathway may contribute to correlated responses of more distant PV+INs.

      For experiment using focal stimulation, substantial trial-to-trial variation in SC spike timing argues strongly against deterministic coordination. Indirect coordination of presynaptic neurons is also extremely unlikely given that focal activation is sparse and brief, while inputs from many presynaptic SCs are required to drive a postsynaptic interneuron to spike (e.g. (Pastoll et al. 2013; Couey et al. 2013)). Results from these experiments thus corroborate results from experiments using ramping light stimulation.

      In revising the manuscript we have tried to ensure these arguments are clear (e.g. p 5, para 3; p 6, para 2; p 10, para 1).

      (2) The hypothesis about the "direct excitatory-inhibitory" synaptic interactions is made based on the GABAzine experiments in Figure 4. In the Figure 8 diagram, the direct interaction is illustrated between PV+ INs and SCs. However, the evidence supporting this "direct interaction" between these two cell types is missing. Is it possible that pyramidal cells are also involved in this interaction? Some pieces of evidence or discussions are necessary to further support the "direction interaction".

      Indirect connections between stellate cells mediated via fast spiking inhibitory interneurons are well established by previous studies (e.g. (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016), and so were not addressed here. Previous work also establishes that connections from stellate cells to pyramidal cells are extremely rare (Winterer et al. 2017). Because the Sim1:Cre mouse line is specific to stellate cells and does not drive transgene expression in pyramidal cells (Sürmeli et al. 2015), it's therefore unlikely that pyramidal cells play a role.

      To make these points clearer we have modified the text in the discussion (p 5, para 3; p 10, paras 1 & 2). We have also modified Figure 8 to highlight that the indirect interaction may be best accounted for by inhibitory pathways onto PV+INs rather than via SCs (which our new cross-correlation analyses indicate is unlikely).

      Reviewer #2 (Public Review):

      In this study, Huang et al. employed optogenetic stimulation alongside paired whole-cell recordings in genetically defined neuron populations of the medial entorhinal cortex to examine the spatial distribution of synaptic inputs and the functional-anatomical structure of the MEC. They specifically studied the spatial distribution of synaptic inputs from parvalbumin-expressing interneurons to pairs of excitatory stellate cells. Additionally, they explored the spatial distribution of synaptic inputs to pairs of PV INs. Their results indicate that both pairs of SCs and PV INs generally receive common input when their relative somata are within 200-300 ums of each other. The research is intriguing, with controlled and systematic methodologies. There are interesting takeaways based on the implications of this work to grid cell network organization in MEC.

      We appreciate the positive comments.

      (1) Results indicate that in brain slices, nearby cells typically share a higher degree of common input. However, some proximate cells lack this shared input. The authors interpret these findings as: "Many cells in close proximity don't seem to share common input, as illustrated in Figures 3, 5, and 7. This implies that these cells might belong to separate networks or exist in distinct regions of the connectivity space within the same network.". Every slice orientation could have potentially shared inputs from an orthogonal direction that are unavoidably eliminated. For instance, in a horizontal section, shared inputs to two SCs might be situated either dorsally or ventrally from the horizontal cut, and thus removed during slicing. Given the synaptic connection distributions observed within each intact orientation, and considering these distributions appear symmetrically in both horizontal and sagittal sections, the authors should be equipped to estimate the potential number of inputs absent due to sectioning in the orthogonal direction. How might this estimate influence the findings, especially those indicating that many close neurons don't have shared inputs?

      Given we find high probabilities of correlated inputs to nearby cells in both planes, our conclusion that nearby cells are likely to receive common inputs appears to be independent of the slice plane. For cells further apart, where the degree of correlated input becomes more variable, it is possible that cell pairs that have low input correlations measured in one slice plane would have high input correlations if measured in a different plane. An argument against this is that as the cell pairs are further apart, it is less likely that an orthogonal axon would intersect dendritic trees of both cells. Nevertheless, we can't rule this out given the data here. We have amended the discussion to highlight this possibility (p 10, para 1). We agree it would be interesting to address this point further with quantitative analyses but this will be difficult without detailed reconstructions of the circuit.

      (2) The study examines correlations during various light-intensity phases of the ramp stimuli. One wonders if the spatial distribution of shared (or correlated) versus independent inputs differs when juxtaposing the initial light stimulation phase, which begins to trigger spiking, against subsequent phases. This differentiation might be particularly pertinent to the PV to SC measurements. Here, the initial phase of stimulation, as depicted in Figure 7, reveals a relatively sparse temporal frequency of IPSCs. This might not represent the physiological conditions under which high-firing INs function. While the authors seem to have addressed parts of this concern in their focal stim experiments by examining correlations during both high and low light intensities, they could potentially extract this metric from data acquired in their ramp conditions. This would be especially valuable for PV to SC measurements, given the absence of corresponding focal stimulation experiments.

      We understand the gist of the question here as being can differences in correlation scores between initial vs later phases of responses to ramping light inputs be used to infer spatial organisation? These differences are likely to reflect heterogeneity in the spiking of the input neurons, for example through differences in spike threshold, spike frequency adaptation and saturation of spiking (e.g. Figure 2, Figure Supplement 1A, and also see (Pastoll et al. 2020)). We don't expect these differences to have any spatial organisation along the mediolateral axis, and while spike threshold follows a dorsoventral organisation there is nevertheless substantial local variation between neurons (Pastoll et al. 2020). It's therefore unlikely we can use differences in early versus late correlations to make the inferences proposed by the reviewer.

      With respect to PV to SC measurements, similar heterogeneity is likely. We note that we were unable to carry out focal stimulation experiments for PV to SC connections as PV neurons did not spike in response to focal optogenetic stimulation.

      With respect to physiological conditions, our aim here is simply to assess connectivity in well controlled conditions, e.g. voltage-clamp, minimal spontaneous activity, known neuronal locations, etc. It's not clear that physiological activation patterns would improve on these tests and quite likely data would be noisier and harder to interpret.

      (3) Re results from Figure 2: Please fully describe the model in the methods section. Generally, I like using a modeling approach to explore the impact of convergent synaptic input to PVs from SCs that could effectively validate the experimental approach and enhance the interpretability of the experimental stim/recording outcomes. However, as currently detailed in the manuscript, the model description is inadequate for assessing the robustness of the simulation outcomes. If the IN model is simply integrate-and-fire with minimal biophysical attributes, then the findings in Fig 2F results shown in Fig 2F might be trivial. Conversely, if the model offers a more biophysically accurate representation (e.g., with conductance-based synaptic inputs, synapses appropriately dispersed across the model IN dendritic tree, and standard PV IN voltage-gated membrane conductances), then the model's results could serve as a meaningful method to both validate and interpret the experiments.

      We appreciate the simulation descriptions were insufficient and have modified the manuscript to include additional details and clarification (p 14, paras 1-3).

      We're not sure we follow the logic here with respect to model types. The experiments were carried out in the voltage-clamp recording configuration with the goal of identifying correlated inputs independently from how they are integrated by the postsynaptic neuron. Given that membrane potential doesn't change (and so the CdVm/dt term of the membrane equation = 0), integrate and fire and point conductance-based models both simplify down to summing of input currents. We achieve this by convolving spike times with experimentally measured synaptic current waveforms. An assumption of our approach is that we achieve a reasonable space clamp. We believe this is justified given that stellate cells and PV interneurons are reasonably electrotonically compact, and that our analysis relies on consistent correlations rather than absolute amplitudes or time constants of the postsynaptic response and so should tolerate moderate space clamp errors.

      Reviewer #3 (Public Review):

      This paper presents convincing data from technically demanding dual whole-cell patch recordings of stellate cells in medial entorhinal cortex slice preparations during optogenetic stimulation of PV+ interneurons. The authors show that the patterns of postsynaptic activation are consistent with dual recorded cells close to each other receiving shared inhibitory input and sending excitatory connections back to the same PV neurons, supporting a circuitry in which clusters of stellate cells and PV+IN interact with each other with much weaker interactions between clusters. These data are important to our understanding of the dynamics of functional cell responses in the entorhinal cortex. The experiments and analysis are quite complex and would benefit from some revisions to enhance clarity.

      These are technically demanding experiments, but the authors show quite convincing differences in the correlated response of cell pairs that are close to each other in contrast to an absence of correlation in other cell pairs at a range of relative distances. This supports their main point of demonstrating anatomical clusters of cells receiving shared inhibitory input.

      We appreciate the positive comments.

      The overall technique is complex and the presentation could be more clear about the techniques and analysis. In addition, due to this being a slice preparation they cannot directly relate the inhibitory interactions to the functional properties of grid cells which was possible in the 2-photon in vivo imaging experiment by Heys and Dombeck, 2014.

      We have modified the manuscript to try to improve the presentation (specific changes are detailed below). We agree that an important future challenge is to relate our findings to in vivo observations (p 11, para 2).

      Reviewer #1 (Recommendations For The Authors):

      Major points

      (1) The study largely relies on the fact that ramp-like wide-field optogenetic stimulation and focal optogenetic activation both drove asynchronous action potentials in SCs, and therefore, if a pair of PV+ INs exhibited correlated activity, they should receive common inputs. In Figure 2 and its supplementary figures, the authors also showed examples of asynchronized activity. However, it is unclear to me what criteria/thresholds were used to determine the level of activity asynchronization, and under these criteria, what percentage of cells actually showed synchronized or less asynchronized activity. A notable percentage of synchronized or less asynchronized SCs could complicate the results, i.e., PV+ INs with correlated activity could receive inputs from different SCs (different inputs), which had synchronized activity. Related to this concern, it would also be important to simulate what level of activity asynchronization in SCs could still lead to correlated PV+ IN activity above shuffle, and among the recorded SCs, what percentage of cells belong to this synchronized/less asynchronized category.

      We address this point in our response to the public review. In brief, we have added additional cross-correllograms showing that ramp activation of SC pairs does not cause detectable synchronous activation. We also clarify that sensitivity of correlations of some widely separated pairs to GABA-blockers is suggestive of SCs activating common inhibitory inputs to cell pairs.

      (2) The above concern is more relevant to the focal stimulation experiments, in which the authors tried to claim that a pair of PV+ INs with correlated activity could receive inputs from the same SCs neurons. The authors also showed that the stimulation patterns leading to the activation of PV+ INs were more similar if PV+ INs had correlated activity (Figure 5D). However, if nearby SCs were more synchronized than distal SCs within this stimulation scale, even though a pair of PV+ INs showed correlated activity, they could still receive inputs from different but nearby SCs. In this case, it would be helpful to quantify the relationship between the level of activity synchronization of SCs and their distances. In Figure 5 Supplementary Figure 1, the data were only provided for 8 cells. If feasible, collecting data from more cells would be needed for the proposed analysis.

      We explain in our responses to point 1 above and in the public review that direct synchronisation of SCs is unlikely. This is particularly unlikely for focal stimulation experiments as the timing of responses of individual SCs is extremely variable between trials. Thus, even if there were strong synaptic connections between SCs, which the evidence suggests there is not (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016), then this would be unlikely to result in reliably timed coordinated firing.

      (3) It is unclear what the definition of "common inputs" is. Do they refer to inputs from the same group of cells? If different groups of cells provide synchronized inputs, will the inputs be considered "common inputs" or "different inputs"?

      We used "common" in an attempt to be consistent with classic work by Yoshimura et al. and in an attempt to be succinct. Thus, by common input we are referring to cell pairs for which a proportion of their input is from the same presynaptic neuron(s), as opposed to cell pairs for which their input is from different neurons and therefore have no common input. We have attempted to make sure this is clear in the revised manuscript (e.g description of simulations on p 4, para 2).

      (4) In the introduction and abstract, it was mentioned that "dense, but specific, direct excitatory-inhibitory synaptic interactions may operate at the scale of grid cell clusters". It is unclear to me how "dense" was demonstrated in the data. Can the authors clarify?

      Thanks for flagging this, we were insufficiently clear. We have revised the text to refer to cell pairs for which a proportion of their input is from the same presynaptic neurons (e.g. p 3, para 1), and separately about indirect coordination, by which we mean inputs to cell pairs that appear correlated because of coordination between upstream neurons.

      (5) The hypothesis about the "direct excitatory-inhibitory" synaptic interactions is made based on the GABAzine experiments in Figure 4. In the Figure 8 diagram, the direct interaction is illustrated between PV+ INs and SCs. Is there any evidence supporting this "direct interaction"?

      The direct interaction from SCs to PV+INs and from PV+INs to SCs were previously demonstrated by experiments with recordings from pairs of neurons (e.g. (Pastoll et al. 2013; Couey et al. 2013; Fuchs et al. 2016; Winterer et al. 2017). Our results in Figures 3-5, which show that exciting SCs by light activation of ChR2 leads to excitation of PV+INs, and in Figure 7, which show that light activation of PV+INs expressing ChR2 leads to inhibition of SCs, are consistent with these previous conclusions. We have modified the manuscript to make sure this is clear (p 2, para 3).

      Is it possible that pyramidal cells are also involved in this interaction? If this is unlikely, the author may provide some pieces of evidence (e.g., timing of responses after optogenetic stimulation) or some discussions.

      This is unlikely given that previous studies indicate that connections from stellate to pyramidal cells are weak or absent (Winterer et al. 2017). We now clarify this in the Discussion (p 10, para 1).

      Minor points (1) Page 4: the last paragraph: the author claimed that CCpeakmean was reduced and CClagvar increased with cell separation. Although the trends are visible in the figures, the author may provide appropriate statistics to support this statement, such as a correlation between cell separation and CCpeakmean CClagvar./

      We have inserted summaries of linear model fits into the legends for Figure 3E-F, Figure 5F-H and Figure 7D.

      (2)  If I understood correctly, in the second last paragraph on page 6, "pairs of SCs" should be changed to "pairs of PV+ INs".

      Thanks. Corrected.

      (3)  Page 9: the 7th line to the end: where is Figure S4?

      Corrected to 'Figure 3, Figure Supplement 2'.

      (4)  Page 27: at the end of figure caption B: two ".

      Corrected.

      (5)  Figures 3A and B: what are the red vertical rectangles?

      These are the regions shown on an expanded time base in C and D. This is now clarified in the legend.

      (6)  Page 28 Figure caption of D and E: (C) and (D) should be (D) and (E).

      Corrected.

      (7)  The first sentence of the third paragraph in INTRODUCTION: 'later' should be 'layer'.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      - Some related work has been done by Beed et al. 2013 to map the spatial distribution of inputs to neurons in MEC. Certainly, there are differences in the approaches and the key questions, but the contribution of this study would benefit from a more detailed comparison of the results from Beed vs the current study and should be included in the discussion.

      It's hard to include a detailed comparison of results, at least without losing focus, as the two studies address different questions with different approaches. We already noted that 'Local optical activation of unidentified neurons has also been used to infer connectivity principles but with a focus on responses of single postsynaptic neurons (Beed et al., 2013, 2010)'. In addition, we now note that 'Our focal optogenetic stimulation approach also offers insight into the spatial organization of presynaptic neuronal populations, with the advantage, compared to focal glutamate uncaging previously used to investigate connectivity in the MEC (Beed et al., 2013, 2010), that the identity of the presynaptic cell population is genetically defined'.

      - There are a few places where the language is ambiguous or needs a more detailed description for clarity. • 3rd paragraph under "Focal activation of SCs generates common input to nearby PV+Ins". The correlation probability description in this paragraph and a similar sentence in the methods are very hard to understand. I had to look up the analysis in Yoshimura et al. 2005 to understand what was done here. It's a nice analysis, but the manuscript could benefit from a more detailed description of this measure in the methods.

      We agree, it is a somewhat complex metric and is challenging to explain. In the interests of keeping the main text succinct, we have left the bare bones explanation as it was in the Results, but have expanded the explanation in the Methods. We hope this is now clear.

      - " Alternatively, if there is no clear spatial organization of SC to PV+INs connections, then the similarity between stimulus locations for pairs of SCs should have a random distribution." This sentence is hard to understand. I think the use of the phrase "similarity of stimulus location" is a strange phrasing and is driving the confusion in this sentence.

      We have replaced this with 'correspondence between active stimulus locations'.

      - In the discussion under "Spatial extent and functional organization of L2 circuits" there is a grammatical mistake (seems to be 2x phrasing of "leads to common synaptic input").

      Corrected.

      - Citation in the introduction/discussion. Introduction: in addition to Gu et al. 2018, Heys et al 2014 also showed there are non-random correlations among putative grid cells as a function of their somatic distance. In the discussion section, in addition to Gu et al. 2018, Heys et al. 2014 showed there is anatomical clustering of grid cells in MEC. This earlier work investigating functional correlations among neurons in the superficial aspect of MEC in vivo should be cited and is particularly relevant in these two sections of the manuscript.

      Thanks, we apologise for the oversight. We're well aware of this important study and have now cited it.

      -Typo - Paragraph 3 of the intro; "later" should be layer.

      Corrected.

      -Figure 5 (D-E) there is a typo high correlation probability is D and low correlation is E (text says C/D).

      Corrected.

      Reviewer #3 (Recommendations For The Authors):

      The paper is missing the bibliography section. This makes the review somewhat difficult as some cited papers are not immediately familiar based on the citation.

      Thanks and our apologises for making extra work by omitting this. It is now included.

      Page 2 - "cell clusters" - they should also cite the paper by Heys and Dombeck, 2014 that shows a spatial scale of inhibitory interactions computed based on correlations of grid cells recorded using 2-photon calcium imaging.

      Added (see above).

      Page 2 - "later 2 of the MEC" - layer.

      Corrected.

      Page 2 - "synaptic interactions" - again they should mention the work by Heys and Dombeck, 2014 that indirectly measured the spatial scale of inhibition.

      Now cited in this paragraph.

      Page 4 "we simulated responses" and Figure 2E - in each simulation - did they fit the magnitude and time constant of the simulated EPSCs to individual EPSCs in the data? Or did they randomly vary these to find the best fit?

      The parameters for the simulations are given in the Methods and were chosen to correspond to the experimental values. We have rewritten this section to make the simulation methods clearer. Simulations using different time constants within a physiological range support similar conclusions.

      Page 4 - "we identified 35/71" - Are these the cells that appear in yellow as correlated in Figures 3E-F? If so, the text should indicate that these cells are shown in yellow.

      We have added this and have also updated the legends for additional clarification.

      Figure 2, Figure Supplement 1 - B,C - the following phrase is not clear: "when the 4 / 8 of each neurons inputs from SCs also project to the other neuron (B)," Should the "the" be removed? Also, by 4/8 do they mean 50%, or do they mean 4 to 8?

      Thanks, we've reworded to improve the clarity.

      E - "receiving presynaptic inputs consisted of 4 overlapping SCs" - should it say "consisting"?

      Corrected.

      Figure 3, Figure Supplement 1 part E - "the same data as (C )" - should this be the same data as (D)?? I do not see how doing clustering on the shuffled data in (C ) would give two groups, but it makes sense if it is from (D).

      That's right, now corrected.

      Page 5 - "used action potentials" - this is confusing. Is the word "used" supposed to be there?

      Corrected.

      Page 5 - "widefield activation experiments" - they should cite the experiments that they are referring to here.

      Added.

      Page 5 - "effect of blocking" - "Figure 4" - I find it very odd that the agent GABAzine in Figure 4 is not explicitly mentioned in the main text (though it is mentioned in the methods). The main text should indicate that blocking was performed using GABAzine.

      Added.

      Page and page 14 and Figure 5 - "shifted" - do they mean shuffled?

      We do. The classic papers by Yoshimura et al. used shifted so we keep this here so it's clear we've used their approach. We've added additional explanation to try to make sure the meaning is clear.

      Figure 5 A, B, D, and E would benefit from a more detailed description. They should state whether the labels "1a" and "1b" and "2a" and "2b" refer to different recorded neurons in each pair. They should indicate that 2a and 2b are a different pair? Are the x, y axes of the images corresponding to anatomical position? Does "B" indicate the location of recordings shown in Figure 5B? The authors probably think this is all obvious, but it is not immediately obvious to the reader.

      We have added additional clarification.

      Page 8 - "Beed et al." - These papers by Beed ought to be cited in the introduction as well as they are highly relevant.

      We now cite Beed et al. 2013 in the Introduction when we discuss local inhibitory input to SCs. While the Beed et al. 2010 paper is an important contribution to understanding about pathways from deep to superficial layers, the introduction focuses on communication between identified pre- and postsynaptic populations within layer 2 and therefore we haven't found a way to cite it without losing focus. We do cite this paper multiple times elsewhere.

      Page 10 - "Excitatory-inhibitory interactions" - this summary of attractor models ought to cite the paper by Burak and Fiete as well.

      The discussion focuses on models with excitatory-inhibitory connectivity and cites an important paper from the Fiete group. The model by Burak and Fiete, while also important, is purely inhibitory and so is not well constrained by the known circuitry, and therefore could not be correctly cited here.

      Page 10 - "be consistent with models…or that focus on pyramidal neurons have also been proposed" - this seems ungrammatical as if two different sentences were merged.

      Corrected.

      References

      Couey, Jonathan J, Aree Witoelar, Sheng-Jia Zhang, Kang Zheng, Jing Ye, Benjamin Dunn, Rafal Czajkowski, et al. 2013. “Recurrent Inhibitory Circuitry as a Mechanism for Grid Formation.” Nat. Neurosci. 16 (3): 318–24. https://doi.org/10.1038/nn.3310.

      Dudman, Joshua T, and Matthew F Nolan. 2009. “Stochastically Gating Ion Channels Enable Patterned Spike Firing through Activity-Dependent Modulation of Spike Probability.” Plos Comput. Biol. 5 (2): e1000290. https://doi.org/10.1371/journal.pcbi.1000290.

      Fuchs, Elke C, Angela Neitz, Roberta Pinna, Sarah Melzer, Antonio Caputi, and Hannah Monyer. 2016. “Local and Distant Input Controlling Excitation in Layer II of the Medial Entorhinal Cortex.” Neuron 89 (1): 194–208. https://doi.org/10.1016/j.neuron.2015.11.029.

      Pastoll, Hugh, Derek L Garden, Ioannis Papastathopoulos, Gülşen Sürmeli, and Matthew F Nolan. 2020. “Inter- and Intra-Animal Variation in the Integrative Properties of Stellate Cells in the Medial Entorhinal Cortex.” Elife 9 (February). https://doi.org/10.7554/eLife.52258.

      Pastoll, Hugh, Lukas Solanka, Mark C W van Rossum, and Matthew F Nolan. 2013. “Feedback Inhibition Enables Theta-Nested Gamma Oscillations and Grid Firing Fields.” Neuron 77 (1): 141–54. https://doi.org/10.1016/j.neuron.2012.11.032.

      Sürmeli, Gülşen, Daniel Cosmin Marcu, Christina McClure, Derek L F Garden, Hugh Pastoll, and Matthew F Nolan. 2015. “Molecularly Defined Circuitry Reveals Input-Output Segregation in Deep Layers of the Medial Entorhinal Cortex.” Neuron 88 (5): 1040–53. https://doi.org/10.1016/j.neuron.2015.10.041.

      Winterer, Jochen, Nikolaus Maier, Christian Wozny, Prateep Beed, Jörg Breustedt, Roberta Evangelista, Yangfan Peng, Tiziano D’Albis, Richard Kempter, and Dietmar Schmitz. 2017. “Excitatory Microcircuits within Superficial Layers of the Medial Entorhinal Cortex.” Cell Rep. 19 (6): 1110–16. https://doi.org/10.1016/j.celrep.2017.04.041.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The manuscript by Agha et al. provides a fundamental understanding regarding the participation of V2a interneurons in generating and patterning the locomotor rhythm. The authors provide convincing and solid evidence regarding the heterogeneity of V2a neurons in their intrinsic and synaptic properties and how these shape their outputs. The manuscript could be much improved by the inclusion of statistical analysis of some of the key data currently presented qualitatively. 

      We are extremely grateful for the positive and thorough comments provided by the three reviewers and have now had the opportunity to address all their concerns, as detailed below in our point-by-point response. Specifically, we have provided statistical analysis and major revisions to the text to help with rigor, clarity and interpretation, and we have also include new perturbation experiments that provide a more definitive test of one of our predictions – namely that reciprocal inhibition plays speed-specific roles in rhythm generation and pattern formation. The revisions greatly improve the manuscript and help bolster our conclusions.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary:

      In this very interesting study, Agha and colleagues show that two types of Chx10-positive neurons (V2a neurons) have different anatomical and electrophysiological properties and receive distinct patterns of excitatory and inhibitory inputs as a function of speed during fictive swimming in the larval zebrafish. Using single-cell fills they show that one cell type has a descending axon ("descending V2as"), while the other cell type has both a descending axon and an ascending axon ("bifurcating V2as"). In the Chx10:GFP line, descending V2as display strong GFP labeling, while bifurcating V2as display weak GFP labeling. The bifurcating V2as are located more laterally in the spinal cord. These two cell types have different electrophysiological properties as revealed by patch-clamp recordings. Positive current steps indicated that descending V2as comprise tonic spiking or bursting neurons. Bifurcating V2as comprise chattering or bursting neurons. The two types of V2a neurons display different recruitment patterns as a function of speed. Descending tonic and bifurcating chattering neurons are recruited at the beginning of the swimming bout, at fast speeds (swimming frequency above 30 Hz). Descending bursting neurons were preferentially recruited at the end of swimming bouts, at low speeds (swimming frequency below 30 Hz), while bifurcating bursting neurons were recruited for a broader swimming frequency range. The two types of V2a neurons receive distinct patterns of excitatory and inhibitory inputs during fictive locomotion. In descending V2as, when speed increases: i) excitatory conductances increase in fast neurons and decrease in slow neurons; ii) inhibitory conductances increase in fast neurons and increase in slow neurons. In bifurcating V2as, when speed increases: i) excitatory conductances increase in fast neurons but do not change in slow neurons; ii) inhibitory conductances increase in fast neurons and do not change in slow neurons. The timing of excitatory and inhibitory inputs was then studied. In descending V2as, fast neurons receive excitatory and inhibitory inputs that are in anti-phase with low contrast in amplitude and are both broadly distributed over the phase. The slow neurons receive two peaks of inhibition, one in anti-phase with the excitatory inputs and another just after the excitation. In bifurcating V2as, fast neurons receive two peaks of inhibition, while slow ones receive anti-phase inhibition. 

      Strengths: 

      This study focuses on the diversity of V2a neurons in zebrafish, an interesting cell population playing important roles in locomotor control and beyond, from fish to mammals. The authors provide compelling evidence that two subtypes of V2as show distinct anatomical, electrophysiological, and speed-dependent spiking activity, and receive distinct synaptic inputs as a function of speed. This opens the door to future investigation of the inputs and outputs of these neurons. Finding ways to activate or inhibit specifically these cells would be very helpful in the years to come. 

      Weaknesses: 

      No major weakness was detected. The experiments were carefully done, and the data were of high quality. 

      We really appreciate the positive assessment and have addressed minor issues below.

      Reviewer #2 (Public Review): 

      Summary: 

      Animals exhibit different speeds of locomotion. In vertebrates, this is thought to be implemented by different groups of spinal interneurons and motor neurons. A fundamental assumption in the field has been that neural mechanisms that generate and sustain the rhythm at different locomotor speeds are the same. In this study, the authors challenge this view. Using rigorous in vivo electrophysiology during fictive locomotion combined with genetics, the authors provide a detailed analysis of cellular and synaptic properties of different subtypes of spinal V2a neurons that play a crucial role in rhythm generation. Importantly, they are able to show that speed-related subsets of V2a neurons have distinct cellular and synaptic properties and may utilize different mechanisms to implement different locomotor speeds. 

      Strengths: 

      The authors fully utilize the zebrafish model system and solid electrophysiological analyses to study the active and passive properties of speed-related V2a subsets. Identification of the V2a subtype is based directly on their recruitment at different locomotor speeds and not on indirect markers like soma size, D-V position etc. Throughout the article, the authors have cleverly used standard electrophysiological tests and analysis to tease out different neuronal properties and link it to natural activity. For example, in Figures 2 and 4, the authors make comparisons of V2a spiking with current steps and during fictive swims showing spike rates measured with current steps are physiologically relevant and observed during natural recruitment. The experiments done are rigorous and well-controlled.

      Weaknesses: 

      The authors claim that a primary result of their study is that reciprocal inhibition is important for rhythmogenesis at fast speeds while recurrent inhibition is key at slow speeds. This is shown in Figure 6, however, the authors do not show any statistical tests for this claim. The authors also do not show any conclusive evidence that reciprocal inhibition is required for rhythmogenesis at fast speeds and vice versa for slow speeds. Additional experiments or modeling studies that conclusively show the necessity of these different inhibitory sources to the generation of different rhythms would be needed to strengthen this claim. 

      We have added new loss-of-function experiments as requested to strengthen the claim that reciprocal inhibition is critical for rhythmogenesis at fast speeds, but dispensable at slow. Specifically, we use botulinum toxin selectively expressed in Dmrt3-labeled dI6 interneurons, which play a role in reciprocal inhibition at a variety of speeds (new Figure 7). These experiments demonstrate a selective impact on rhythmic burst generation and alternation during periods of swimming where the highest frequency motor activity occurs. During lower frequency activity, rhythm generation is preserved, however motor output is selectively altered, consistent with the idea that reciprocal inhibition plays an important role in patterning at slow speeds.

      The authors do a great job of teasing out cellular and synaptic properties in the different V2a subsets, however, it is not clear if or how these match the final output. For example, V2aD neurons are tonic or bursting for fast and slow speeds respectively but it is not intuitive how these cellular properties would influence phasic excitation and inhibition these neurons receive. 

      This question gets at the heart of what we are trying to illustrate in Figure 6. Specifically, in the new Figure 6E,F we have aligned the cumulative distribution of spikes recorded in cell-attached mode with phasic excitatory and inhibitory currents to reveal how well cellular properties versus patterns of synaptic drive match the final output (spikes). Our expectation was if intrinsic cellular properties where ultimately generating phasic spiking patterns, then patterns of excitatory and inhibitory drive need not be phasic. Instead, we see that synaptic drive is phasic with spiking occurring between peaks in excitation and troughs in inhibition.  Since post-synaptic cellular properties should not impact the pre-synaptic excitation they receive, this suggests that phasic spiking in all V2a neurons regardless of the capacity for cellular rhythmogenesis is a result of phasic input. In response to this concern, we have elaborated our discussion of what cellular properties may contribute and the impact on output in the Discussion (L502-511). 

      It is not clear from the discussion why having different mechanisms of rhythm generation at different speeds could be an important circuit design. The authors use anguilliform and carangiform modes of swimming to denote fast and slow speeds but there are differences in these movements other than speed, like rostrocaudal coordination. The frequency and pattern of these movements are linked and warrant more discussion. 

      We appreciate the opportunity to elaborate on this point more in the Discussion. In particular, we have added more text to clarify differences in movement related to both pattern-formation and rhythm-generation (L373-398) and to also suggest potential reasons for differences in mechanisms of rhythm generation (L478-488).  

      Reviewer #3 (Public Review):

      The manuscript by Agha et al. explores mechanisms of rhythmicity in V2a neurons in larval zebrafish. Two subpopulations of V2a neurons are distinguishable by anatomy, connectivity, level of GFP, and speed-dependent recruitment properties consistent with V2a neurons involved in rhythm generation and pattern formation. The descending neurons proposed to be consistent with rhythm-generating neurons are active during either slow or fast locomotion, and their firing frequencies during current steps are well matched with the swim frequency they firing during. The bifurcating (patterning neurons) are active during a broader swim frequency range unrelated to their firing during current steps. All of the V2a neurons receive strong inhibitory input but the phasing of this input is based on neuronal type and swim speed when the neuron is active, with prominent in-phase inhibition in slow descending V2a neurons and bifurcating V2a neurons active during fast swimming. Antiphase inhibition is observed in all V2a neurons but it is the main source of rhythmic inhibition in fast descending V2a neurons and bifurcating neurons active during slow swimming. The authors suggest that properties supporting rhythmic bursting are not directly related to locomotor speed but rather to functional neuronal subtypes. 

      This is a well-written paper with many strengths including the rigorous approach. Many parameters, including projection pattern, intracellular properties, inhibition received, and activity during slow/fast swimming were obtained from the same neuron. This links up very well with prior data from the lab on cell position, birth order, morphology/projections, and control of MN recruitment to provide a comprehensive overview of the functioning of V2a interneuronal populations in the larval zebrafish. The overall conclusions are well supported by the data. Weaknesses are relatively minor and were largely related to terminology for some of the secondary conclusions. 

      (1) The assumption is made that all in-phase inhibition is recurrent and out-of-phase inhibition is reciprocal. The latter is likely true but the definition of recurrent may be a bit loose as could be multisegmental feed-forward inhibition as well. 

      This is an excellent point, which was also raised by Reviewer 1. We have now added references that justify this assertion (L281-283). We also add a new figure with schematics (Figure 8) to make it clearer how we are defining sources of recurrent versus reciprocal inhibition, as based on the anatomical constraints of the circuit. We agree that multi-segmental inputs could contribute to inhibition, but they will likely be more broadly distributed based on rostro-caudal location and contribute to tonic sources of drive.  We now clarify this (L285-286).

      (2). In a few places, it is mentioned that the properties of the V2a-D neurons are consistent with pacemakers. This could be true of both the V2a-D and -B neurons that burst in response to depolarizing steps but the properties of the remaining (fast) V2a-D neurons do not seem to be consistent with pacemakers, based on the properties shown. Tonic firing at a frequency related to the locomotor speed the neuron is active during and strong antiphase inhibition may instead suggest a stronger network component driving the rhythmicity. 

      We have been purposefully agnostic regarding the relative contribution of pacemaking to rhythm generation in the paper. Our measurements of bursting overlap with swim frequencies only in the V2a-D subtype. Similarly, the spike rates of V2a-D neurons alone overlap with their swim frequencies (Fig 2D,G,I). Since both respond to tonic input (current injection) by spiking in a pattern that resembles their natural spiking behavior, we have treated these cellular properties both as pacemaking. Although the bursting behavior is more consistent with what is normally considered pacemaking in rhythmic motor circuits, in the basal ganglia field tonic firing of dopaminergic neurons in the substantia nigra is referred to as pacemaking. Since the tonic firing pattern overlaps with swimming frequency in the same way the bursting pattern does, we are less inclined to discount its possible contribution to rhythmogenesis based on the fact they do not burst. We have made modifications to the document to make this point clearer (L409-416).  Regardless, our data argue that pacemaking is unlikely to be a major contributor to phasic firing in V2a neurons, at least at midbody, so we agree with you on this last point.

      Reviewer #1 (Recommendations For The Authors): 

      I only have very minor suggestions. 

      (1) It would be useful to add a table or a figure summarizing the main results (integration of anatomy, electrophysiological properties, synaptic inputs, firing, swimming speed). 

      We agree and have added a figure panel summarizing the main results (new Figure 8).

      (2) Some statistics to possibly add (only suggestions): Do bifurcating V2as display significantly weaker GFP labeling than descending V2as? Do descending V2as have a significantly smaller soma size? Do descending V2as have a significantly lower rheobase and significantly higher resistance? Are tonic descending neurons and chattering bifurcating neurons located significantly more dorsally than the bursting descending and bifurcating neurons? Is there a way to show that bifurcating bursting neurons are recruited statistically on a broader swimming frequency range than other cell types (e.g. SD, coefficient of variation, cumulative distribution function with Kolmogorov-Smirnov test)? 

      For the first question, in all cases when we targeted more dimly labeled neurons they were bifurcating. We now clarify this in the text (L119, L129-132). However, this is difficult to quantify, since absolute levels of fluorescence will vary from preparation to preparation based on the dissection and intensity of epifluorescence illumination. In addition, we did not always take images prior to recording and levels of GFP after recording will vary depending on relative state of dialysis. So, unfortunately, we cannot provide a rigorous statistical analysis beyond the qualitative statement we provide.

      For the remainder of the questions, we now provide statistical analysis for soma size, position, rheobase, and resistance for the data in Figure 2.  Please note, we have reported all our statistical analyses in the figure legends. We also provide analysis of the density distributions of swimming frequencies for slow bursting bifurcating neurons and slow bursting descending neurons as requested, which are significantly different following a K-S test (L162).

      (3) Some details to possibly add (only suggestions): proportion of neurons in which single cell fills were done/checked anatomically? Proportions of bursting/chattering/tonic/bursting neurons? In Figure 1, maybe define visually bifurcating vs descending neurons. In Figure 2I, the recruitment of bifurcating chattering neurons is not plotted. Is that normal? Figures 6D, E, maybe specify more clearly which neurons are the fast and slow ones. In Figure 3C, the X-axis name is missing. 

      For the first question, the proportion is 100%, since the morphology of all neurons was confirmed post recording, which we now clarify in the Methods section (L573). For the second question, the numbers of bursting/chattering/tonic/bursting neurons are now reported in legend of Figure 2, in addition to the total number of V2a-D and V2a-B types, so it is clear what proportion of the recording population this represents. For the third question, in Figure 1 we cannot define V2a neurons as bifurcating or descending yet, this was only possible to confirm after the recording (Figure 2), and was done for every neuron (as mentioned above). For the fourth question, for Figure 2I the chattering response was too variable to be meaningful in terms of averaging and plotting, which we now mention in the text (L169-171). The standard deviations are ridiculous. For the fifth question, we have modified Figures 6D, E to more clearly label fast and slow V2a neurons. Finally, we have included the X-axis label in Figure 3C, thank you!

      (4) Some text to possibly modulate (only suggestions): 

      A possible role for these V2a subtypes in the rhythm generation and pattern formation layer is an interesting idea but this may not be completely solved by the present experiments. Maybe the authors could suggest future experiments in the discussion that would establish how to tackle this important question (double bursts, deletions, etc...)? 

      We appreciate the opportunity to raise future experiments that could help further tease apart their contribution to rhythm and pattern and have now added potential experiments to the Discussion (L498-501; L527-529), which include more precise molecular identification, spatial perturbation, and computational modeling.

      It would be nice to cite the references in which the rhythm/pattern CPG concept was proposed initially (lines 49-50 and elsewhere, Cf. Perret and Cabelguen 1980 Brain Res; Perret et al. 1989 Stance and Motion, Plenum Press; McCrea et al. 2006 J Physiol). 

      Apologies for our poor scholarship here, we now credit the appropriate primary research articles (L50-51).

      In the abstract, it would be useful to say clearly which cells are descending vs. bifurcating ones. Same thing in the result section, maybe it would be nice to identify the two populations long before line 127. 

      We have modified the abstract and introduction sections accordingly. We also note that the two populations are defined in the first paragraph of the results (L90).

      About the possible mechanism of rhythm generation, it is mentioned in line 54 that a single mechanism was proposed to exist, but the authors also mention in lines 122-123 that several mechanisms were proposed for rhythm generation... Maybe adjust the introduction? 

      As requested, we have clarified our meaning in the introduction (L55-58). Several mechanisms exist, but the likelihood that different mechanisms operate at different speeds has not been considered.  Either cellular properties are tuned to different speeds (i.e., bursting is faster in neurons recruited at faster speeds) or network properties can explain different speeds (i.e., different frequencies and patterns emerge from the connectivity).

      About the convention that in fish in-phase currents originate from the ipsilateral and out-of-phase currents originate from the contralateral side (lines 271-275), is there any reference for this assumption? 

      Yes, we now provide references (L281-283).

      Lines 338-345 stating that reciprocal inhibition is important for rhythm generation as predicted by the half-center model can sound surprising to some authors considering that many studies showed that inhibition is not needed for rhythm generation, including lamprey hemicords stimulated electrically (Cangiano and Grillner 2003 J Neurophysiol; 2005 J Neurosci, Cangiano et al. 2012 Neuroscience), salamander hemicords or hemisegments stimulated chemically (Ryczko et al. 2010, 2015 J Neurophysiol), or rhythm activity evoked on each side of the cord using optogenetic stimulation of glutamatergic neurons (Hägglund et al. 2013 PNAS) etc. To demonstrate the importance of inhibition in rhythmogenesis, one would need to activate and/or deactivate the ipsilateral versus contralateral inhibitory neurons. It would be nice to maybe add citations to such studies if available in the zebrafish literature. Overall I would simply suggest modulating this section to be a bit more balanced conceptually. 

      We have included the above referenced studies for lampreys and added ones for tadpoles (L464-468), to stick with undulatory swimmers. We had focused on experiments with the most selective perturbations in the interests of space, but appreciate the opportunity to present both arguments. We also include new loss-of-function experiments that impact one spinal population linked to reciprocal inhibition (Dmrt3-labeled dI6 interneurons), which demonstrate a speed-specific impact on rhythmogenesis (L323-371; new Figure 7) and compare our findings to a recent study in the zebrafish literature examining the impact of spinal Dmrt3-ablations on axial rhythmogenesis (L426-433).

      Line 676 "episodies". 

      Thanks, corrected.

      Reviewer #2 (Recommendations For The Authors): 

      The authors make a claim that recurrent and reciprocal inhibition play key roles in rhythmogenesis at different speeds. This is not conclusively shown. Rayleigh's z-test can be used to test the significance of the directionality of circular data. Including more data from experiments or computational models to show the necessity of reciprocal or recurrent inhibition for timed spiking of V2a neurons would address this. 

      We have now modified Figure 6 so we can directly compare differences in reciprocal and recurrent inhibition between V2a types. We now report statistical analysis in the figure legends using a Watson’s Two Test for Homogeneity to test differences in the circular data. As mentioned above, we have also added new loss-of-function experiments as requested to strengthen the claim that reciprocal inhibition is critical for rhythmogenesis at fast speeds, but dispensable at slow. Specifically, we use botulinum toxin selectively expressed in Dmrt3-labeled dI6 interneurons, which play a role in reciprocal inhibition at a variety of speeds (new Figure 7). These experiments demonstrate a selective impact on rhythmic burst generation and alternation during periods of swimming where the highest frequency motor activity occurs. During lower frequency activity, rhythm generation is preserved, however motor output is selectively altered, consistent with the idea that reciprocal inhibition plays an important role in patterning at slow speeds.

      In Figure 4D, the authors show that V2a neurons, both subtypes, spike in advance of the center of the motor burst. Recent studies (Jay et al., 2023) have shown differences in the timing of V2aD and V2aB neurons. Are there differences in the methods or selection of cells that would reflect differences in results? 

      This is a great point and we appreciate the opportunity to reconcile our observations here with those in Jay et al., 2023. In the Jay et al paper, we used drifting visual stimuli to evoke fictive swimming.  These experiments allow you to uncouple rhythm generation (forward propulsion) and pattern formation (lateral direction). Notably, fictive swim frequencies during so called optomotor responses are below 35Hz, meaning that we are sampling exclusively from V2a neurons recruited during carangiform swim mode. In these experiments, slow V2a-D neurons fire well in advance of slow V2a-B neurons, compared to what we see here which is relatively synchronous. Critically, however, the phase-advanced firing pattern revealed in the Jay et al paper for V2a-D neurons aligns with the phase-advanced excitatory input reported here.  In addition, the recruitment probabilities of slow V2a-D neurons are higher in the Jay et al paper than what we report here. Collectively these observations suggest either more effective excitation during optomotor responses (Jay et al) or more potent inhibition during escape responses (Agha et al). Ultimately, differences in the relative synchrony of firing among slow V2a-D and slow V2a-B neurons appears to depend on the nature of the stimulus and range of swim frequencies, where in one case frequency and amplitude modulation are coupled over a broad range of frequencies (somatosensory stimuli delivered here), while in the other case frequency and amplitude modulation are uncoupled over a narrow range of frequencies (visual stimuli in Jay at al). We now elaborate on this point in the Discussion (L485-498).

      Given the conserved nature of spinal circuits across vertebrates, it is also important to discuss these findings in the context of limbed animals. In tetrapods, changes in locomotor speed also involve pattern/gait changes, however, it is not known if or how these changes in frequency and pattern are linked. This study, by suggesting that different speeds are implemented not only by different neurons but possibly by different neuronal mechanisms, provides important cues for the missing link and would strengthen the discussion. 

      We agree and have made substantial edits to the beginning Discussion to provide better context for the impact of our work (L373-398).

      Minor points: 

      Line 122: of needs to be replaced by or. 

      Corrected, thanks!

      Figure 3B Top panel: What is the grey bar? 

      This has been removed for clarity.

      Figure 3B bottom panel is not referenced in the main text at all. 

      Now referenced (L187, L189)

      Line 260: 2nd inhibition needs to be replaced with excitation. 

      Done, thanks!

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments: 

      - Figure 2 panel ordering is visually appealing but tough to follow. 

      We apologize and tried reconfigurations, but they just looked too kludgy.  Hoping for a pass on this one.

      - Lines 164-166 and 319-327 (related to comment 2 above): For the fast/tonic V2a-Ds, it is not clear that this is intrinsic and it is not consistent with pacemaker properties. This could also be (and likely is) synaptically/network-driven rhythmicity, although the firing frequencies match up well with the swim frequencies. 

      Fast/tonic V2a-Ds were tested with somatic current injection as with all other neurons, which we assume primarily reflects intrinsic cellular properties. The spike rates we observe in fast/tonic V2a-Ds overlap with spike rates observed during fictive swimming, so they are positioned as well as bursting neurons to contribute to pacemaking. We also elaborate on this point in response to Major Comment #2.

      - Lines 189-192: The patterning neurons receive excitatory drive before rhythm-generating neurons. The time constant explanation makes sense for why two neurons with a common drive would fire at different times but this does not support the proposed hierarchical arrangement or being consistent with V2a-Bs being downstream as mentioned in lines 49-56 and 218-219. 

      In response to this point, we have modified Figure 6 so we can directly compare the timing of presynaptic excitatory inputs between the types. Here it can be seen clearly that phasic excitatory inputs to both fast and slow V2a-Ds are phase-advanced relative to fast and slow V2a-Ds (Figure 6B,C). As the reviewer mentions, it is likely a combination of time constants and the relative balance of excitation and inhibition that ultimately lead to synchronous spiking despite differences in the timing of inputs.

      - Lines 338-339: It is not shown that the rhythm relies on inhibition during slow. 

      This line has been removed in the revision process.

      - Consistent with the importance of reciprocal (contralateral) inhibition in fast locomotion here, rodent fictive locomotion is slower in hemisect than in the full cord. However, the Rybak and O'Donovan groups suggest that this is due to loss of drive to ipsilateral inhibitory neurons by excitatory contralateral projections, rather than contralateral inhibitory interneurons (see Falgairolle and O'Donovan 2019, 2021, and Shevtsova et al 2022). 

      This is an interesting point that highlights how we are defining reciprocal versus recurrent inhibition. In this example, although ipsilaterally-projecting interneurons are responsible for inhibition, since they are excited by commissurally-projecting excitatory interneurons, we would classify this as feedforward (reciprocal) not feedback (recurrent) inhibition. So reciprocal (feedforward) inhibition is still important to get higher frequency rhythms, it is di-synaptic in this case. We have added a new figure (Figure 8) to clarify what we mean by reciprocal (feedforward) and recurrent (feedback) based on the ipsilateral projection patterns of V2a neurons, and point out the definitions would be flipped for excitatory interneurons in the Discussion (L452-455).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      While there are many models for sequence retrieval, it has been difficult to find models that vary the speed of sequence retrieval dynamically via simple external inputs. While recent works [1,2] have proposed some mechanisms, the authors here propose a different one based on heterogeneous plasticity rules. Temporally symmetric plasticity kernels (that do not distinguish between the order of pre and post spikes, but only their time difference) are expected to give rise to attractor states, asymmetric ones to sequence transitions. The authors incorporate a rate-based, discrete-time analog of these spike-based plasticity rules to learn the connections between neurons (leading to connections similar to Hopfield networks for attractors and sequences). They use either a parametric combination of symmetric and asymmetric learning rules for connections into each neuron, or separate subpopulations having only symmetric or asymmetric learning rules on incoming connections. They find that the latter is conducive to enabling external inputs to control the speed of sequence retrieval.

      Strengths:

      The authors have expertly characterised the system dynamics using both simulations and theory. How the speed and quality of retrieval varies across phases space has been well-studied. The authors are also able to vary the external inputs to reproduce a preparatory followed by an execution phase of sequence retrieval as seen experimentally in motor control. They also propose a simple reinforcement learning scheme for learning to map the two external inputs to the desired retrieval speed.

      Weaknesses:

      (1) The authors translate spike-based synaptic plasticity rules to a way to learn/set connections for rate units operating in discrete time, similar to their earlier work in [5]. The bio-plausibility issues of learning in [5] carry over here, for e.g. the authors ignore any input due to the recurrent connectivity during learning and effectively fix the pre and post rates to the desired ones. While the learning itself is not fully bio-plausible, it does lend itself to writing the final connectivity matrix in a manner that is easier to analyze theoretically.

      We agree with the reviewer that learning is not `fully bio-plausible’. However, we believe that extending the results to a model in which synaptic plasticity depends on recurrent inputs is beyond the scope of this work. We have added a mention of this issue in the Discussion in the revised manuscript.

      (2) While the authors learn to map the set of two external input strengths to speed of retrieval, they still hand-wire one external input to the subpopulation of neurons with temporally symmetric plasticity and the other external input to the other subpopulation with temporally asymmetric plasticity. The authors suggest that these subpopulations might arise due to differences in the parameters of Ca dynamics as in their earlier work [29]. How these two external inputs would connect to neurons differentially based on the plasticity kernel / Ca dynamics parameters of the recurrent connections is still an open question which the authors have not touched upon.

      The issue of how external inputs could self-organize to drive the network to retrieve sequences at appropriate speeds is addressed in the Results section, paragraph `Reward-driven learning’. These inputs are not `hand-wired’ - they are initially random and then acquire the necessary strengths to allow the network to retrieve the sequences at different speeds thanks to a simple reinforcement learning scheme. We have rewritten this section to clarify this issue.

      (3) The authors require that temporally symmetric and asymmetric learning rules be present in the recurrent connections between subpopulations of neurons in the same brain region, i.e. some neurons in the same brain region should have temporally symmetric kernels, while others should have temporally asymmetric ones. The evidence for this seems thin. Though, in the discussion, the authors clarify 'While this heterogeneity has been found so far across structures or across different regions in the same structure, this heterogeneity could also be present within local networks, as current experimental methods for probing plasticity only have access to a single delay between pre and post-synaptic spikes in each recorded neuron, and would therefore miss this heterogeneity'.

      We agree with the reviewer that this is currently an open question. We describe this issue in more detail in the Discussion of the revised manuscript.

      (4) An aspect which the authors have not connected to is one of the author's earlier work:

      Brunel, N. (2016). Is cortical connectivity optimized for storing information? Nature Neuroscience, 19(5), 749-755. https://doi.org/10.1038/nn.4286 which suggests that the experimentally observed over-representation of symmetric synapses suggests that cortical networks are optimized for attractors rather than sequences.

      We thank the reviewer for this suggestion. We have added a paragraph in the discussion that discusses work on statistics of synaptic connectivity in optimal networks. We expect that in networks that contain two subpopulations of neurons, the degree of symmetry should be intermediate between a network storing fixed point attractors exclusively, and a network storing sequences exclusively.

      Despite the above weaknesses, the work is a solid advance in proposing an alternate model for modulating speed of sequence retrieval and extends the use of well-established theoretical tools. This work is expected to spawn further works like extending to a spiking neural network with Dale's law, more realistic learning taking into account recurrent connections during learning, and experimental follow-ups. Thus, I expect this to be an important contribution to the field.

      We thank the reviewer for the insightful comments.

      Reviewer #2 (Public Review):

      Sequences of neural activity underlie most of our behavior. And as experience suggests we are (in most cases) able to flexibly change the speed for our learned behavior which essentially means that brains are able to change the speed at which the sequence is retrieved from the memory. The authors here propose a mechanism by which networks in the brain can learn a sequence of spike patterns and retrieve them at variable speed. At a conceptual level I think the authors have a very nice idea: use of symmetric and asymmetric learning rules to learn the sequences and then use different inputs to neurons with symmetric or asymmetric plasticity to control the retrieval speed. The authors have demonstrated the feasibility of the idea in a rather idealized network model. I think it is important that the idea is demonstrated in more biologically plausible settings (e.g. spiking neurons, a network with exc. and inh. neurons with ongoing activity).

      Summary

      In this manuscript authors have addressed the problem of learning and retrieval sequential activity in neuronal networks. In particular, they have focussed on the problem of how sequence retrieval speed can be controlled?

      They have considered a model with excitatory rate-based neurons. Authors show that when sequences are learned with both temporally symmetric and asymmetric Hebbian plasticity, by modulating the external inputs to the network the sequence retrieval speed can be modulated. With the two types of Hebbian plasticity in the network, sequence learning essentially means that the network has both feedforward and recurrent connections related to the sequence. By giving different amounts of input to the feed-forward and recurrent components of the sequence, authors are able to adjust the speed.

      Strengths

      - Authors solve the problem of sequence retrieval speed control by learning the sequence in both feedforward and recurrent connectivity within a network. It is a very interesting idea for two main reasons: 1. It does not rely on delays or short-term dynamics in neurons/synapses 2. It does not require that the animal is presented with the same sequences multiple times at different speeds. Different inputs to the feedforward and recurrent populations are sufficient to alter the speed. However, the work leaves several issues unaddressed as explained below.

      Weaknesses

      - The main weakness of the paper is that it is mostly driven by a motivation to find a computational solution to the problem of sequence retrieval speed. In most cases they have not provided any arguments about the biological plausibility of the solution they have proposed e.g.:

      - Is there any experimental evidence that some neurons in the network have symmetric Hebbian plasticity and some temporally asymmetric? In the references authors have cited some references to support this. But usually the switch between temporally symmetric and asymmetric rules is dependent on spike patterns used for pairing (e.g. bursts vs single spikes). In the context of this manuscript, it would mean that in the same pattern, some neurons burst and some don't and this is the same for all the patterns in the sequence. As far as I see here authors have assumed a binary pattern of activity which is the same for all neurons that participate in the pattern.

      There is currently only weak evidence for heterogeneity of synaptic plasticity rules within a single network, though there is plenty of evidence for such a heterogeneity across networks or across locations within a particular structure (see references in our Discussion). The reviewer suggests another interesting possibility, that the temporal asymmetry could depend on the firing pattern on the post-synaptic neuron. An example of such a behavior can be found in a paper by Wittenberg and Wang in 2006, where they show that pairing single spikes of pre and post-synaptic neurons lead to LTD at all time differences in a symmetric fashion, while pairing a pre-synaptic spike with a burst of post-synaptic spikes lead to temporally asymmetric plasticity, with a LTP window at short positive time differences. We now mention this possibility in the Discussion, but we believe exploring fully this scenario is beyond the scope of the paper.

      - How would external inputs know that they are impinging on a symmetric or asymmetric neuron? Authors have proposed a mechanism to learn these inputs. But that makes the sequence learning problem a two stage problem -- first an animal has to learn the sequence and then it has to learn to modulate the speed of retrieval. It should be possible to find experimental evidence to support this?

      Our model does not assume that the two processes necessarily occur one after the other. Importantly, once the correct external inputs that can modulate sequence retrieval are learned, sequence retrieval modulation will automatically generalize to arbitrary new sequences that are learned by the network.

      - Authors have only considered homogeneous DC input for sequence retrieval. This kind of input is highly unnatural. It would be more plausible if the authors considered fluctuating input which is different from each neuron.

      We have modified Figure 1e and Figure 2c to show the effects of fluctuating inputs on pattern correlations and single unit activity. We find that these inputs do not qualitatively affect our results.

      - All the work is demonstrated using a firing rate based model of only excitatory neurons. I think it is important that some of the key results are demonstrated in a network of both excitatory and inhibitory spiking neurons. As the authors very well know it is not always trivial to extend rate-based models to spiking neurons.

      I think at a conceptual level authors have a very nice idea but it needs to be demonstrated in a more biologically plausible setting (and by that I do not mean biophysical neurons etc.).

      We have included a new section in the discussion with an associated figure (Figure 7) demonstrating that flexible speed control can be achieved in an excitatory-inhibitory (E-I) spiking network containing two excitatory populations with distinct plasticity mechanisms.

      Reviewer #1 (Recommendations For The Authors):

      In the introduction, the authors state: 'symmetric kernels, in which coincident activity leads to strengthening regardless of the order of pre and post-synaptic spikes, have also been observed in multiple contexts with high frequency plasticity induction protocols in cortex [21]'. To my understanding, [21]'s final model 3, ignores LTD if the post-spike also participates in LTP, and only considers nearest-neighbour interactions. Thus, the kernel would not be symmetric. Can the authors clarify what they mean and how their conclusion follows, as [21] does not show any kernels either.

      In this statement, we were not referring to the model in [21], but rather the experimentally observed plasticity kernels at different frequencies. In particular, we were referring to the symmetric kernel that appears in the bottom panel of Figure 7c in that paper.

      The authors should also address the weaknesses mentioned above. They don't need to solve the issues but expand (and maybe indicate resolutions) on these issues in the Discussion.

      For ease of reproducibility, the authors should make their code available as well.

      We intend to publish the code required to reproduce all figures on Github.

      Reviewer #2 (Recommendations For The Authors):

      -  Show the ground state of the network before and after learning.

      We have decided not to include such a figure, as we have not analyzed the learning process, but instead a network with a fixed connectivity matrix which is assumed to be the end result of a learning process.

      -  Authors have only considered a network of excitatory neurons. This does not make sense. I think they should demonstrate a network of both exc. and inch. neurons (spiking neurons) exhibiting ongoing activity.

      See our comment to Reviewer #2 in the previous section.

      -  Show how the sequence dynamics unfolds when we assume a non-zero ongoing activity.

      We are not sure what the reviewer means by `non-zero ongoing activity. We show now the dynamics of the network in the presence of noisy inputs, which can represent ongoing activity from other structures (see Fig 1e and 2c).

      -  From the correlation (==quality) alone it is difficult to judge how well the sequence has been recovered. Authors should consider showing some examples so that the reader can get a visual estimate of what 0.6 quality may mean. High speed is not really associated with high quality (Fig 2b). So it is important to show how the sequence retrieval quality is for non-linear and heterogeneous learning rules.

      We believe that some insight into the relationship between speed and quality for the case of non-linear and heterogeneous learning rules is addressed by the correlation plots for chosen input configurations (see Fig. 3a and and 5b). We leave a full characterization for future work.

      -  Authors should show how the retrieval and quality of sequences change when they are recovered with positive input, or positive input to one population and negative to another. In the current version sequence retrieval is shown only with negative inputs. This is a somewhat non-biological setting. The inhibitory gating argument (L367-389) is really weak.

      We would like to clarify that with the parameters chosen in this paper, the transfer function has half its maximal rate at zero input. This is due to the fact we chose the threshold to be zero, using the fact that any threshold can be absorbed in the external inputs. Thus, negative inputs really mean sub-threshold inputs, and they are consistent with sub-threshold external excitatory inputs. We have clarified this issue in the revised manuscript.

      -  Authors should demonstrate how the sequence retrieval dynamics is altered when they assume a fluctuating input current for sequence retrieval instead of a homogeneous DC input.

      See our comment to Reviewer #2 in the previous section.

      -  Authors should show what are the differences in synaptic weight distribution for the two types of learning (bi-linear and non-linear). I am curious to know if the difference in the speed in the two cases is related to the weight distribution. In general I think it is a good idea to show the synaptic weight distribution before and after learning.

      As mentioned above, we do not study any learning process, but rather a network with a fixed connectivity matrix, assumed to represent the end result of learning. In this network, the distribution of synaptic weights converges to a Gaussian in the large p and cN limits, independently of the functions f and g, because of the central limit theorem, if there are no sign constraints on weights. In the presence of sign constraints, the distribution is a truncated Gaussian.

      -  I suggest the use of a monochromatic color scale for figure 2b and 3b.

      Figure 3: The sentence describing panel 2 seems incomplete.

      Also explain why there is non-monotonic relationship between I_s and speed for some values of

      I_a in 3b

      There is a non-monotonic relationship for retrieval quality, not speed. We have clarified this in the manuscript text, but don’t currently have an explanation for why this phenomenon occurs for these specific values of I_a.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Additional Discussion Points

      (1) There is not much exploration of potential mechanisms, i.e., the impact of PV neuron activity on the broader circuit. Additionally, the study exclusively focuses on PV cells and does not explore the role of other prefrontal populations, particularly those known to respond to cueevoked fear states. The discussion should consider how PV activity might impact the broader circuit and whether the present findings are specific to PV cells or applicable to other interneuron subtypes.

      We have added an extensive discussion of potential mechanisms and the potential contributions of other interneuron subtypes:

      “For example, PV neurons aid in improving visual discrimination through sharpening response selectivity in visual cortex (Lee et al., 2012). In prefrontal cortex, PV neurons are critical for task performance, particularly during performance of tasks that require flexible behavior such as rule shift learning (Cho et al., 2020) and reward extinction (Sparta et al., 2014). Further, PV neurons play an essential role in the generation of cortical gamma rhythms, which contribute to synchronization of selective populations of pyramidal neurons (Sohal et al., 2009; Cardin et al., 2009). Courtin et al (2014) showed that brief suppression of dorsomedial prefrontal (dmPFC) PV neural activity enhanced fear expression, one of the main functions of the dmPFC, by synchronizing the spiking activity of dmPFC pyramidal neurons (Courtin et al., 2014). This result is potentially relevant to our findings, but likely involves different circuit mechanisms because of the difference in timescale, targeted area, and downstream projection targets (Vertes, 2004). These and other studies support the idea that PV neural activity supports the execution of a behavior by shaping rather than suppressing cortical activity, potentially by selecting among conflicting behaviors by the synchronization of different pyramidal populations (Warden et al., 2012; Lee et al., 2014).

      The roles of other inhibitory neural subtypes (such as somatostatin (SOM)-expressing and vasoactive intestinal peptide (VIP)-expressing IL GABA neurons) in avoidance behavior are currently unknown, but are likely important given the role of SOM neurons in gamma-band synchronization (Veit et al., 2017), and the role of VIP neurons in regulating PV and SOM neural activity (Cardin, 2018).” 

      (2) There is some discordance between changes in neural activity and behavior. For example, in Figure 4C, the relationship between PV neuron activity and movement emerges almost immediately during learning, but successful active avoidance emerges much more gradually. Why is this?

      We have added extensive text to the discussion that addresses this issue:

      “Interestingly, the rise in IL PV neural activity during movement does not require avoidance learning. IL PV neurons begin to respond during movement immediately after the animal has received a single shock in an environment, but learning to cross the chamber to avoid the signaled shock takes tens of trials. Why is there a discordance between the emergence of the IL PV signal during movement and avoidance learning?

      The components underlying active avoidance have been debated over the years, but are thought to involve at least two essential behaviors – suppressing freezing, and moving to safety (LeDoux et al., 2017). Freezing is the default response of mice upon hearing a shock-predicting tone, and can be learned in a single trial (Ledoux, 1996; Fanselow, 2010; Zambetti et al., 2022). When a predator is in the distance, freezing can increase the chance of survival by reducing the chances of detection. However, a strategic avoidance behavior may prevent a future encounter with the predator altogether. The importance of IL PV neural activity in defensive behavior may be to suppress reactive defensive behaviors such as freezing in order to permit a flexible goaldirected response to threat.

      The freezing suppression and avoidance movement components of the avoidance response are dissociable, both because freezing precedes avoidance learning, and because animals intermittently move prior to avoidance learning. Our finding that the rise in PV activity during movement emerges immediately after receiving a single shock, tens of trials before animals have learned the avoidance behavior, suggests that the IL PV signal is associated with the suppression of freezing. Further, IL PV neurons do not respond during movement toward cued rewards because in reward-based tasks there is no freezing response in conflict with reward approach behavior.” 

      (3) vmPFC was defined here as including the infralimbic (IL) and dorsal peduncular (DP) regions. While the role of IL has been frequently characterized for motivated behavior, relatively few studies have examined DP. Perhaps the authors are just being cautious, given the challenges involved in the viral targeting of the IL region without leakage to nearby regions such as DP. But since the optical fibers were positioned above the IL region, it is possible that DP did not contribute much to either the fiber photometry signals or the effects of the optogenetic manipulations. Perhaps DP should be completely omitted, which is more consistent with the definitions of vmPFC in the field.

      Yes, we included DP to be cautious as our viral expression sometimes leaks into DP, though the optic fiber targets IL. We have replaced vmPFC with IL throughout the manuscript. 

      (4) In the Discussion, the authors should consider why PV cells exhibit increased activity during both movement initiation and successful chamber crossing during avoidance. While the functional contribution of the PV signal during movement initiation was tested with optogenetic inhibition, some discussion on the possible role of the additional PV signal during chamber crossing is of interest readers who are intrigued by the signaling of two events. Is the chamber crossing signal related to successful avoidance or learned safety (e.g., see Sangha, Diehl, Bergstrom, Drew 2020)?

      IL PV neural activity starts to increase at movement initiation, peaks at chamber crossing (when movement speed is highest), and decreases after chamber crossing (Figure 1E). Thus, the increase in PV neural activity at movement initiation and at chamber crossing are different phases of the same event. 

      We think this signal is unlikely to be a safety signal, and have added text to the discussion to clarify this issue:

      “We think the IL PV signal is unlikely to be a safety signal (Sangha et al., 2020). First, the PV signal rises during movement not only in the avoidance context, but during any movement in a “threatening” context (i.e. a context where the animal has been shocked). For example, PV neural activity rises during movement during the intertrial interval in the avoidance task. Further, the emergence of the PV signal during movement happens quickly – after the first shock – and significantly before the animal has learned to move to the safe zone. This suggests a close association with enabling movement in a threatening environment, when animals must suppress a freezing response in order to move. Additionally, the rise in PV activity was specifically associated with movement and not with tone offset, the indicator of safety in this task. Finally, if IL PV neural activity reflects safety signals one would expect the response to be enhanced by learning, but the amplitude of the IL PV response was unaffected by learning after the first shock.”

      (5) The primary conclusion here that PV cells control the fear response should be considered within the context of prior findings by the Herry laboratory. Courtin et al (2014) demonstrated a select role of prefrontal PV cells in the regulation of fear states, accomplished through their control over prefrontal output to the basolateral amygdala. The observations in this paper, which used both ChR2 and Arch-T to address the impact of vmPFC PV activity on reactive behavior, are highly relevant to issues raised both in the Introduction and Discussion.

      Courtin et al (2014)’s finding is very important. We did not discuss this paper originally because Courtin et al. is about dmPFC, which has a different role in fear processing than IL/vmPFC. We have added text about this finding to the discussion:

      “Courtin et al (2014) showed that brief suppression of dorsomedial prefrontal (dmPFC) PV neural activity enhanced fear expression, one of the main functions of the dmPFC, by synchronizing the spiking activity of dmPFC pyramidal neurons (Courtin et al., 2014). This result is potentially relevant to our findings, but likely involves different circuit mechanisms because of the difference in timescale, targeted area, and downstream projection targets (Vertes, 2004).

      Additional analyses

      (1) As avoidance trials progress (particularly on days 2 and 3), do PFC PV responses attenuate? That is, does continued unreinforced tone presentations lead to reduced reliance of PV cellmediated suppression in order for successful avoidance to occur?

      We added Figure 1—Figure supplement 1M and 1N and a sentence on page 5: “IL PV neural activity during the avoidance movement was not attenuated by learning or repeated reinforcement (Figure 1—Figure supplement 1M and N, N = 8 mice, p = 0.8886, 1-way ANOVA).” We only included data from days 1 and 2, since we started to introduce short and long tone trials on day 3 which might interfere. 

      (2) In Figure 3D, it would be very informative and further support the claim of "no role for movement during reward" if the response of these cells during the "initiation of movement during reward-approach" was shown (similar to Figure 1F for threat avoidance).

      Thank you for the question. We added Figure 3—Figure supplement 1B and C to show IL PV neural activity aligned to initiation of movement during reward-approach. IL PV activity decreased after movement initiation for reward approach (N = 6 mice, p=0.0382, paired t-test). This further solidifies our claim that IL PV neuron activity only increases for threat avoidance.   

      Reviewer 1 (Recommendations For The Authors):

      (1) Fig1G shows the average response of PV cells during chamber crossing on an animal-toanimal basis. It would be informative to also see a similar plot for movement initiation.

      We have added the suggested figure in Figure 1—Figure supplement 1B.  

      (2) In the Results section (Page 5), there is a small issue with the logic. It says: "As vmPFC inactivation impairs avoidance behavior, the activity of inhibitory vmPFC PV neurons might be predicted to be low during successful avoidance trials." As opposed to "low", it should say "high", right? If inhibition impairs avoidance, then high responding by these cells would be presumed to drive the avoidance response, as supported by your findings.

      We have re-worded the text in this section. Based on prior findings that IL inactivation impairs avoidance (Moscarello et al., 2013), we predicted that inhibitory PV neurons would be less active during avoidance, because activating these neurons could suppress IL. However, we found that they were selectively active during avoidance.

      (3) In the caption/legend for Fig1E, it says that the "black ticks" indicate "tone onset". But it should say "movement initiation".

      We thank the reviewer for pointing out this error. The ticks do indicate tone onset, and we have corrected the figure to reflect this. 

      Reviewer 2 (Recommendations For The Authors):

      (4) Perhaps replace the term 'good outcomes' with 'reinforcing outcomes' or simply 'reinforcement'.

      Thank you for the suggestion. We have replaced ‘good outcomes’ with ‘reinforcing outcomes’.

      Reviewer 3 (Recommendations For The Authors):

      (5) It would be useful to provide some (perhaps speculative) explanation for the discordance between the PV activity-movement relationship and success of active avoidance in Fig. 4C

      We have added text to the discussion that addresses this issue:

      “Interestingly, the rise in IL PV neural activity during movement does not require avoidance learning. IL PV neurons begin to respond during movement immediately after the animal has received a single shock in an environment, but learning to cross the chamber to avoid the signaled shock takes tens of trials. Why is there a discordance between the emergence of the IL PV signal during movement and avoidance learning?

      The components underlying active avoidance have been debated over the years, but are thought to involve at least two essential behaviors – suppressing freezing, and moving to safety (LeDoux et al., 2017). Freezing is the default response of mice upon hearing a shock-predicting tone, and can be learned in a single trial (Ledoux, 1996; Fanselow, 2010; Zambetti et al., 2022). When a predator is in the distance, freezing can increase the chance of survival by reducing the chances of detection. However, a strategic avoidance behavior may prevent a future encounter with the predator altogether. The importance of IL PV neural activity in defensive behavior may be to suppress reactive defensive behaviors such as freezing in order to permit a flexible goaldirected response to threat.

      The freezing suppression and avoidance movement components of the avoidance response are dissociable, both because freezing precedes avoidance learning, and because animals intermittently move prior to avoidance learning. Our finding that the rise in PV activity during movement emerges immediately after receiving a single shock, tens of trials before animals have learned the avoidance behavior, suggests that the IL PV signal is associated with the suppression of freezing. Further, IL PV neurons do not respond during movement toward cued rewards because in reward-based tasks there is no freezing response in conflict with reward approach behavior.” 

      (6) I don't really understand what is shown in Figure 4D -- exactly what time points does this represent? Was habituation performed everyday?

      Figure 4D shows data from the approach task, not the avoidance task. This data is from welltrained mice, not the first day of training on this task. There was a pre-task recording period every day.

      (7) Why was optogenetic inhibition only delivered from 0.5-2.5 sec after the tone cue?

      We wanted to avoid any possibility that perception of the tone would be disrupted, so we delayed the onset of optogenetic inhibition. We chose 0.5 sec onset because animals typically begin to move ~1 second after tone onset.

      (8) The regression analysis with shuffled time points is not well explained -- some additional methodological details are needed (Fig. 2H).

      We added the following to the methods section to provide a clearer explanation: 

      “DF/F (t) was modeled as the linear combination of all event kernels. Given the event occurrence time points of all event types, we can use linear regression to decompose characteristic kernels for each event type. Kernel coefficients of the model were solved by minimizing the mean square errors between the model and the actual recorded signals. To prove that kernel ki is an essential component for the raw calcium dynamics, we compared the explanation power of the full model to the reduced model where the time points of the occurrence of event ki were randomly assigned. Thus, the kernel coefficients should not reflect the response to the event in the reduced model. 

      Editor's notes:

      -  Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the pvalue is less than 0.05.

      Thank you for pointing this out. We have included all the test statistics and exact p values as suggested.

      -  Please note the sex of the mice and distribution of sexes in each group for each experiment.

      We have added the sex of mice for all experiments in the methods section.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 2 (Public Review):

      Stress response in males versus females: The authors argue that the contextual control over behaviour was more robust in female rats as females show less within session variability and greater resistance to stress. What evidence is there that the restraint stress procedure caused a similar stress response in both sexes? That is, was the stress induction equally effective in males and females?

      The restraint protocol used in this study is a well-established stressor in rodents, known to produce robust behavioral and physiological effects (HPA axis activation), in both sexes. Although not measured in this study, the ACTH and cortisol responses are actually greater in females during restraint. To the extent that “stress induction” is interpreted as “HPA axis activation”, this strongly suggests that the stress induction in males and females was at least comparable, if not greater in females.

      We have added a few sentences (in the Result and Method section) to highlight this important point. We thank the reviewer for bringing this up.

      Minor corrections:<br /> (1) Please verify that the in-text reference to the figures is correct. I noticed a few mistakes, for example:

      - Line 120 (pdf) refers to Fig. 1 C-D but should refer to D only.

      - Line 312 (pdf) refers to Fig 1D for discrimination ratios but these are shown in Fig 1E

      - No reference in text to 2A

      Thank you for bringing this to our attention. We have fixed the in-text references to the figures.

      (2) In the results it states that the homecage c-Fos+ counts are shown in Figure 5 but I couldn't see these?

      The homecage c-Fos+ counts were initially shown as a pale gray band in the background of the main histograms. Because those counts are very low, it was hard to dissociate this gray band from the black horizontal axis. We have replaced the gray band with a more vivid blue line that is now in the foreground of the histograms. Moreover, we added a note in the figure legend to bring readers’ attention to this homecage count line, close to floor level. 

      (3) Line 306: It is stated that "the use of differential outcomes presumably allows animals to solve the task via simple (nonhierarchical) summation processes". I don't understand the use of "summation" here, isn't it simply that the rats are relying on direct context-outcome and/or cue-outcome associations?

      That’s right. These rats might be relying on direct context-outcome and cue-outcome associations and adding (or summing up) the converging expectations. We have added a few words in the text to clarify what we mean by summation (i.e. the addition of converging cue-evoked + context-evoked predictions).

    1. Author response:

      We thank the reviewers for their kind comments and advice. Like Reviewer 1, we acknowledge that while the exact involvement of Ih in allowing smooth transitions is likely not universal across all systems, our demonstration of the ways in which such currents can affect the dynamics of the response of complex rhythmic motor networks provides valuable insight. To address the concerns of Reviewer 2, we intend to include a sentence in the discussion to highlight the fact that cesium neither increased the pyloric frequency nor cause consistent depolarization in intracellular recordings. We will also highlight that these observations suggest both that cesium is not indirectly raising [K+]outside and support the conclusion that the effects of cesium are primarily through blockade of Ih rather than other potassium channels.

      Reviewer 3 raised some important points about modeling. While the lab has models that explore the effects of temperature on artificial triphasic rhythms, these models do not account for all the biophysical nuances of the full biological system. We have limited data about the exact nature of temperature-induced parameter changes and the extent to which these changes are mediated by intrinsic effects of temperature on protein structure versus protein interactions/modification by e.g. phosphorylation. With respects to the A current, we have seen in Tang et al., 2010 that the activation and inactivation rates are differentially temperature sensitive but do not have the data to suggest whether or not the time courses of such sensitivities are different as well. We intend to mention these facts in the paper, but plan to leave more comprehensive modeling as the purview of future works.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work successfully identified and validated TRLs in hepatic metastatic uveal melanoma, providing new horizons for enhanced immunotherapy. Uveal melanoma is a highly metastatic cancer that, unlike cutaneous melanoma, has a limited effect on immune checkpoint responses, and thus there is a lack of formal clinical treatment for metastatic UM. In this manuscript, the authors described the immune microenvironmental profile of hepatic metastatic uveal melanoma by sc-RNAseq, TCR-seq, and PDX models. Firstly, they identified and defined the phenotypes of tumor-reactive T lymphocytes (TRLs). Moreover, they validated the activity of TILs by in vivo PDX modelling as well as in vitro coculture of 3D tumorsphere cultures and autologous TILs. Additionally, the authors found that TRLs are mainly derived from depleted and late activated T cells, which recognize melanoma antigens and tumor-specific antigens. Most importantly, they identified TRLs associated phenotypes, which provide new avenues for targeting expanded T cells to improve cellular and immune checkpoint immunotherapy.

      Strengths:

      Jonas A. Nilsson, et al. has been working on new therapies for melanoma.  The team has also previously performed the most comprehensive genome-wide analysis of uveal melanoma available, presenting the latest insights into metastatic disease. In this work, the authors performed paired sc-RNAseq and TCR-seq on 14 patients with metastatic UM, which is the largest single-cell map of metastatic UM available. This provides huge data support for other  studies of metastatic UM.

      We thank the reviewer for these kind words about our work.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that these strengths are not  directly demonstrated. That is,  insufficient analyses are performed to fully support the key claims in the manuscript by the data presented. In particular:

      The author's description of the overall results of the article should be logical, not just a description of the observed phenomena. For example, the presentation related to the results of TRLs lacked logic. In addition, the title of the article emphasizes the three subtypes of hepatic metastatic UM  TRLs, but these three subtypes are not specifically discussed in the results as well as the discussion section. The title of the article is not a very comprehensive generalization and should be carefully considered by the authors.

      We thank the reviewer for the critical reading of our work. We have added more data and more discussion.

      The authors' claim that they are the first to use autologous TILs and sc-RNAseq to study immunotherapy needs to be supported by the corresponding literature to be more convincing. This can help the reader to understand the innovation and importance of the methodology.

      We have gone through the manuscript and found that we only refer to being first in using PDX models and autologous TILs to study immunotherapy responses by single-cell sequencing. While there are data to be deduced from other studies, we still believe this to be an accurate statement.

      In addition, the authors argue that TILs from metastatic UM can kill tumor cells. This is the key and bridging point to the main conclusion of the article. Therefore, the credibility of this conclusion should be considered.  Metastatic UM1 and UM9 remain responsive to autologous tumors under in vitro conditions with their autologous TILs.

      UM1 responds also in vivo in the subcutaneous model in the paper. We have also finished an experiment where we show that this model also responds in a liver metastasis model. These data have been added in this revised version of the paper. We add two main figures and one supplementary figure where we characterize the response in vivo and also by single-cell sequencing of TILs.

      In contrast, UM22, also as a metastatic UM, did not respond to TIL treatment. In particular, the presence of MART1-responsive TILs. The reliability of the results obtained by the authors in the model of only one case of UM22 liver metastasis should be considered. The authors should likewise consider whether such a specific cellular taxon might also exist in other patients with metastatic UM, producing an immune response to tumor cells. The results would be more comprehensive if supported by relevant data.

      The reviewer has interpreted the results absolutely right, the allogenic and autologous MART1-specific TILs cells while reactive in vitro against UM22, cannot kill this tumor either in a subcutaneous or liver metastases model. We hypothesize this has to do with an immune exclusion phenotype and show weak immunohistochemistry that suggest this. We hope the addition of more UM1 data can be viewed as supportive of tumor-reactivity also in vivo.

      In addition, the authors in that study used previously frozen biopsy samples for TCR-seq, which may be associated with low-quality sequencing data, high risk of outcome indicators, and unfriendly access to immune cell information. The existence of these problems and the reliability of the results should be considered. If special processing of TCR-seq data from frozen samples was performed, this should also be accounted for.  

      We agree with the reviewers and acknowledge we never anticipated the development of single-cell sequencing techniques when we started biobank 2013. We performed dead cell removal before the 10x Genomics experiment. We have also done extensive quality controls and believe that the data from the biopsies should be viewed as a whole and that quantitative intra-patient comparisons cannot be done.

      Reviewer #2 (Public Review):  

      Summary:  

      The study's goal is to characterize and validate tumor-reactive T cells in liver metastases of uveal melanoma (UM), which could contribute to enhancing immunotherapy for these patients. The authors used single-cell RNA and TCR sequencing to find potential tumor-reactive T cells and then used patientderived xenograft (PDX) models and tumor sphere cultures for functional analysis. They discovered that tumor-reactive T cells exist in activated/exhausted T cell subsets and in cytotoxic effector cells. Functional experiments with isolated TILs show that they are capable of killing UM cells in vivo and ex vivo.

      Strengths:  

      The study highlights the potential of using single-cell sequencing and functional analysis to identify T cells that can be useful for cell therapy and marker selection in UM treatment. This is important and novel as conventional immune checkpoint therapies are not highly effective in treating UM. Additionally, the study's strength lies in its validation of findings through functional assays, which underscores the clinical relevance of the research. 

      We thank the reviewer for these kind words about our work.

      Weaknesses:  

      The manuscript may pose challenges for individuals with limited knowledge of single-cell analysis and immunology markers, making it less accessible to a broader audience.

      The first draft of the manuscript (excluding methods) was written by a person (J.A.N) who is not a bioinformatician. It has been corrected to include the correct nomenclature where applicable but overall it is written with the aim to be understandable. We have made an additional effort in this version. 

      Reviewer #1 (Recommendations For The Authors):  

      (1) Firstly, the authors should provide high-resolution pictures to ensure readability for readers. 

      We have converted to pdf ourselves and that improved resolution. We are happy to provide high-resolution to the office if needed for the printing.

      (2) Furthermore, some parts of the article are more colloquial, and the authors should consider the logic and academic nature of the overall writing of the article. For example, authors should double-check whether the relevant expressions in the results are correct. For example, 'TCR' in the fourth part of the results should be 'TRLs'.

      We thank the reviewer for the recommendations and have gone through the manuscript.

      (3) Moreover, UM22 is described several times in the results as a metastatic UM and should be clearly defined in the methodology.

      The UM22 and UM1 samples are described in-depth in Karlsson et al., Nature Communications, 2020, a paper that is cited in the beginning of Results as part of the narrative. The current work can be viewed as an extension of that work.

      (4) Finally, it is recommended that authors describe a part of the results in full before citing the corresponding picture, otherwise, it will lead to confusion among readers.

      We have made an effort in the revised version to describe the new data in more detail.

      Reviewer #2 (Recommendations For The Authors):  

      The manuscript is very interesting and important to understanding key aspects of uveal melanoma immune profile and functionality. However, in my opinion, there are a few aspects that could be addressed.  

      - The manuscript lacks comprehensive details about the samples used, such as their disease progression, response to treatment, or any relevant information that could shed light on potential differences between samples. It would be valuable to know whether these samples were collected before any systemic treatment or if any of the patients underwent immunotherapy post-sample collection, along with the outcomes of such treatments. Providing this information would enrich the manuscript and provide a more holistic view of the research.

      We thank the reviewer for the recommendation and have included a new Supplementary table 7 with information about the samples. We have also pasted in individual samples’ contribution to the UMAP to add further holistic view.  

      - The results presented and discussed in the manuscript seem to indicate that there were no significant differences across the various samples, including comparisons between lymph-node and liver metastases. However, this lack of variation or the reasons for not discussing any observed differences should be clarified. If there are distinctions between the samples, it would be beneficial to discuss these findings in the manuscript.

      We thank the reviewer for the recommendation. Whereas 14 samples are many for a uveal melanoma study it is not really powered to do intra-patient comparisons.

      - The manuscript may pose difficulties for individuals with limited knowledge of single-cell analysis and immunology markers, potentially limiting its accessibility. To make the research more inclusive, the authors might consider presenting the technical aspects of their work in a less descriptive manner and providing explanations for those less familiar with the technology. This would help a broader audience grasp the significance of the study's findings. 

      The manuscript is from a multidisciplinary team where all have read and commented. The draft was written by a tumor biologist and edited by a bioinformatician for accuracy. We honestly think it is more understandable than most studies in this bioinformatics era. But we have tried to describe the new data in an easier way.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 56: replace "pyomastitis" with "pyogenic skin infections".

      Corrected.

      (2) Line 58: replace "basal strains" with "ancestral strains".

      Corrected.

      (3) Line 62: population structure impacts gene acquisition too, however, gene acquisitions can be easier to connect with a phenotype. For example, acquisition of mecA is thought to be adaptive rather than just linked to a successful lineage. This same reasoning applies to resistance-associated mutations such as gyrA mutations in ST22 emergence.

      We completely agree with the reviewer that population structure also impacts gene acquisition. We wanted to convey that connecting gain or loss of genes to a change in particular phenotype is much easier than doing the same for a mutation, specially in the presence of strong linkage, and therefore gene level analysis is the focus of many previous studies. We have rewritten the sentence to better convey this idea:

      “Due to this limitation, studies of emerging strains often focus on gene level analysis such as acquisition of mobile genetic elements or loss of gene function as their effect on phenotype is easier to determine than that of point mutations.”

      (4) Line 112 this might be simply due to the smaller size of the intergenic regions chosen. I suggest to correct for the size of the genome segment considered.

      We thank the reviewer for pointing this out. The size of the intergenic was indeed the simple explanation for this observation. We have added the following sentence to the manuscript:

      “This is reflective of the fact that most of S. aureus genome sequence comprises of ORFs e.g. ~84% of TCH1516 genome is part of an ORF.”

      (5) Line 189: please add p values to supp table 2.

      We have added the p and q values from DBGWAS into Supp table 2. It is under the ‘DBGWAS Result’ sheet.

      (6) Line 227: high entropy indicates that this site is polymorph, not necessarily that there is selective pressure. In the extreme, this might actually point to a neutral position, since any amino-acid could be equally present (see for example https://www.nature.com/articles/s41467-022-31643-3#Sec10 ).

      We agree that high entropy by itself may point to a position with neutral selection leading to some false positives. However, we were focused on positions that were mostly biallelic in CC8, and with differential prevalence in USA300 vs non-USA300 (albeit in the presence of strong linkage disequilibrium) in addition to having high entropy in non-CC8 strains. This helps us filter some of the positions that were mostly monoallelic or with rare mutations while preserving other sites of interest. The approach was able to find cap5E mutation which has been associated with disruption of capsule production.

      (7) Line 271: show USA500 on the tree.

      Our current study is mostly focused on differences between USA300 and non-USA300 strains and we want to highlight those differences in the tree.

      (8) Line 327: still not possible to infer causality.

      We have changed the language to remove mentions of causality and instead talk about the association of GWAS enriched genes with measured transcriptional changes. The revised sentence now reads:

      “Here, we demonstrated how a model of transcriptional regulation with iModulons can be used to make a headway through the impasse created by the high linkage disequilibrium and identify GWAS-enriched mutations that are also associated with measurable phenotypic changes in the TRN.”

      (9) Line 324: subclades reference.

      We are unsure what this means.

      (10) Line 366: the authors seem to have used a bespoke pan-genome analysis approach. Would they be able to validate it using established tools such as Roary, Pirate or Panaroo? Panaroo in particular appears to have superior accuracy thanks to its pan-genome graph approach (https://github.com/gtonkinhill/panaroo). 

      We have added the results of Roary to our analysis (Figure S1b). The roary results largely agree with our biggest take away from pangenomics which is that our collection of genomes have a good coverage of the CC8 clade at the gene level.

      (11) Line 397: what was the size of the core genome?

      There were 24881 core sites. We have added the number to the manuscript.

      (12) Line 407: please add citation or website for SCCmecFinder.

      The citation of SCCmecFinder (45) is at the end of the sentence.

      (13) Line 421: I was not able to find the code used for this analysis in the github repository provided.

      The code can be found in “notebook/02_Preprocess_DBGWAS.ipynb” within the repo.

      (14) Line 427: this is a very complex analysis for a simple univariate comparison between USA300-vs-non USA300 strains with no correction for population structure. The authors should compare their results with a more established pipeline like Pyseer or Gemma that can handle kmers and show the added value of their approach.

      We wanted to take advantage of DBGWAS’s ability to collapse kmers into unitigs and further collapse significant unitigs within a genetic neighborhood into components. Unfortunately, we found that in many cases, it became difficult to determine the exact mutation that was being enriched e.g. (T234G) without doing lots of manual work. Our network analysis simply parses the DBGWAS graph to automatically extract these mutations, making the results more interpretable. It does not do any additional hypothesis testing.

      We also attempted to pass kmer data into GEMMA but without the compaction provided by DBGWAS the memory required (>168 GB) exceeded what we had available.

      (15) DBGWAS: please indicate DBGWAS version and the options used for kmer size and number of neighbour nodes retained in the subgraph. Also, I assume that no correction for population structure was applied.

      We have added the version and parameters for DBGWAS. The method section now reads:

      “DBGWAS (v0.5.4) was used to enrich mutations unique to USA300 strains using default kmer size of 31 (-k 31) and neighborhood size of 5 (-nh 5). Alleles with frequency less than 0.1 were filtered  (-maf 0.1) and all components enriched with q-values less than 0.05 were documented (-SFF q0.05).”

      (16) Could the authors provide the DBGWAS output for the most significant unitings in graph format? This would help readers understand the findings.

      The outputs are available in the github repo. The link to this specific data is (https://github.com/sapoudel/USA300GWASPUB/tree/master/data/dbgwas/dbgwas_output/visualisations)

      The text format of the output is part of Supplementary Table 2 under “DBGWAS Result” sheet.

      (17) Line 469: please provide more details on iModulons, it is not enough to simply reference the paper: specific QC criteria, mapping algorithm and parameters, ICA algorithm.

      We have now added a new Supplementary Note 2 section with more details about building iModulons.

      (18) Line 474: what is log-TPM?

      Log-Transcripts per Million. We have added the description in the text.

      (19) Line 479: not sure what "Chapter 3" refers to.

      Thank you for correcting the mistake. The reference has been corrected.

      Reviewer #2 (Recommendations For The Authors):

      Line 45. The introduction is not well-structured, and there is a lack of coherence among the topics pertinent to the research objective. I would recommend rewriting this section addressing the following topics: the challenge of distinguishing lineages within the CC8, especially the CA-MRSA USA300 strains; discussing the state-of-the-art GWAS methodologies, elucidating the main confounding factors in the application of GWAS to bacterial studies, and finally, exploring how current methods aim to address these concerns.

      We would like to thank the reviewer for the suggestions. The main innovation of the paper is using iModulons to find phenotype associated mutations from a set of linked mutations. The challenge of distinguishing CC8 subclades has been largely resolved thanks to efforts by Bowers et al. (PMID: 29720527). We have made some revisions to address the GWAS methodologies (bugwas and DBGWAS), the effect of linkage disequilibrium in interpreting the output of these methods and how combining the results of these association tests with modeling of TRN with iModulons can lead to finding candidate mutations of interest that are linked to specific changes in gene regulation.

      Line 56. Replace "pyomastitis" with "pyomyositis".

      Corrected to “pyogenic skin infections.”

      Lines 71. What do the authors mean by "endemic USA300 strain"?

      We have removed references to endemic strains.

      Line 106. Please verify the number of genomes used in the DBGWAS analysis. In the text, the authors mention that 2038 genomes were utilized. However, in Supplementary Table 1, only 2030 genomes are listed.

      Thank you for catching the discrepancy. We started the analysis with 2037 genomes, including four “spiked-in” reference genomes- USA100 D592 (CC5 strain used for rooting the CC8 tree), TCH1516 (same accession number as the one used for ICA), COL and Newman. Before further analysis, we removed 6 genomes for being smaller than 2.5 million base-pairs (see preprocessing.ipynb) and the USA100 D592 strain as it is not part of CC8. This resulted in 2030 genomes being used for DBGWAS. We kept the other 3 spiked CC8 genomes to help annotate the unitigs from DBGWAS.  Lastly, we removed the other three CC8 clade spiked genomes for pangenomic analysis. To clarify this, we have made the following changes to the text:

      (1) Changed line 106: We downloaded 2033 S. aureus genomes for analysis and excluded six of them with genome length of less than 2.5 million base pairs. The remaining 2027 S. aureus CC8 genomes formed a closed pangenome, suggesting that the sampled genomes mostly captured the gene level variations within the clonal complex (Figure 1a).

      (2) DBGWAS section Line 177: We used 2030 genomes for this analysis; the 2027 genomes in pangenomics analysis above were “spiked” with three well known CC8 genomes- TCH1516, COL, and Newman- to help annotate the DBGWAS unitigs.

      Line 108. Could the authors provide a table with the genes that constitute the core, accessory genome, and unique genes for each of the strains?

      The genes presence absence tables are very large files and therefore we have only added them to our github repo. The results can be found in following files:

      Pangenomics: data/pangenome/Pangenomics/CC8_strain_by_gene.pickle.gz

      Lines 112 and 315. On what basis did the authors decide on the size of the upstream regulatory region? In the search for mutations, they extracted segments of 300 base pairs, whereas, in the search for the Fur binding motif, only 100 base pairs were considered. The RegPrecise database contains regulons for Staphylococcus aureus N315 (https://regprecise.lbl.gov/genome.jsp?genome_id=26), including the Fur regulon with multiple Transcription Factor Binding Sites (TFBSs) that extend beyond the 100 base-pair sequence. I would recommend reconsidering the search within the standardized upstream region of -400 base pairs. In the case of the Fur binding motif search, it might be beneficial to include the TFBSs available in the RegPrecise database.

      For Fur motif search, we chose 100 base-pairs because the Fur motif in non-USA300 strains were within ~20 base-pairs of isdH translation start site (Figure 4C). In our search of Fur motif in this analysis, we were not looking to see if any exists, we were simply looking to see if the one proximal to the translation start site exists as our DBGWAS analysis suggested that specific region was deleted in USA300 strains.

      Line 175. This work aimed to identify potential mutations associated with the success of a specific lineage rather than a phenotype, where correction for population structure effects is necessary. Would the implementation of the bugwas method in DBGWAS for controlling bacterial population structure not potentially impact the results? How was this issue addressed in your analysis? Would it not be pertinent to run a program without population structure correction to enable a comparison of results?

      We initially tried to use Linear Mixed Models to find kmers that were only enriched in USA300 strains. These efforts were hampered by extreme linkage disequilibrium which led to high collinearity between kmer abundance making it extremely difficult to get a good estimate of the coefficients. We also tried to run chi-squared tests individually on each kmer which led to unmanageable number (>100k) kmers that were significantly different. DBGWAS on the other hand was able to compress unbranched kmers in the De Bruijn into unitigs and further reduce the number of tests by testing at pattern level instead of unitig level. We found no straight forward way to run DBGWAS (or GEMMA) without population structure correction. Therefore, it is likely we may be underestimating the number of significant unitigs with this approach.

      Line 189. Please italicize the gene name cap5E.

      Corrected.

      Line 277. Please clarify the QC/QA criteria and curation process employed for the selection of RNA-seq experiments, as this constitutes a crucial step in the reconstruction of the network.

      We have now added a new supplementary material section, Supplementary Note 2 titled “Creating iModulons for CC8 Clade Staphylococcus aureus” with details of QC/QA.

      Line 279. In Supplementary Table 3, please label the first column and standardize the use of either the experiment ID or the run ID. Furthermore, verify the experiment identifiers from rows 19 to 26, as I could not locate them in the SRA database.

      We have changed all accession to experiment ID including rows 19 to 26.

      Lines 290, 330, 424, and 437. Please correct "SCCMec" to "SCCmec IVa" (italicize "mec").

      Corrected.

      Line 298. What is the size of the upstream regulatory region considered for this analysis? It is important to standardize this value for all analyses involving the upstream regulatory region. In this regard, I recommend maintaining a consistent size of -400 base pairs.

      For Fur motif search we chose 100 base-pairs because the Fur motif in non-USA300 strains were within ~20 base-pairs of isdH translation start site (Figure 4C). In our search of Fur motif in this analysis, we were not looking to see if any exists, we were simply looking to see if the one proximal to the translation start site exists as our DBGWAS analysis suggested that specific region was deleted in USA300 strains. In our usual analysis, we use -300 base pairs.

      Line 321. The discussion is rather concise and lacks an in-depth comparative perspective with relevant literature on any of the obtained results, whether concerning the proposed methodology or the potential new markers associated with the success of the USA300 lineage. The authors must underscore the method is not applicable to all GWAS analyses, due to the issue of correction for population structure.

      We have now added sections talking about the importance of isdH in S. aureus infection and a section addressing the limitation of the current approach when applied to other GWAS type study.

      Line 366. The authors employed the methodology described in the article by Hyun et al. 2022 (https://doi.org/10.1186/s12864-021-08223-8) to construct the pangenome. However, this methodology was designed for comparative analysis of pangenomes across various species, which does not align with the objective of this study, focusing solely on S. aureus genomes. Consequently, it remains unclear to me why the authors made this particular choice and, more importantly, what advantages it offers over well-established tools for individual pangenomes, such as Roary. I would strongly recommend validating the results using at least one established tool.

      With our analysis, we can determine proper thresholds for core/accessory/unique genes based on the observed data (Supplementary Figure 1a). However, we agree that it would be proper to include a more established pangenome package. We have added the results of Roary to our analysis. The Roary results largely agree with our biggest take away from pangenomics which is that our collection of genomes have a good coverage of the CC8 clade at the gene level.

      Line 370. Please include the version of CD-HIT that was utilized.

      Added. CD-HIT version 4.6 was used for the analysis.

      Line 372. What tool did the authors use to extract these regions?

      The list of CDS, 5’ and 3’ sequences can be extracted easily with a combination of fasta file and gff file. The gff file was used to find the position of each of these sequences and the sequences were extracted from the fasta file with python scripts.

      Line 395. What were the QC/QA criteria used to select the sequences?

      The QC/QA criteria for the sequences are mentioned in the beginning of the Pangnomic analysis subsection and is as follows:

      “Briefly, “complete” or “WGS” samples from CC8/ST8 were downloaded from the PATRIC database. Sequences with lengths that were not within 3 standard deviations of the mean length or those with more than 100 contigs were filtered out.”

      Line 407. Please correct the tool name to "SCCmecFinder" (italicize "mec").

      The name has been corrected.

      Line 409. I believe BLASTp was run locally, so please specify the version used and the search parameters.

      As corrected further down, we used BLASTn not BLASTp. The version v2.2.31 has been added to the methods section.

      Line 416. There is conflicting information with line 409, which mentions that PVL was identified through a protein BLAST, but right below, it states it was a BLASTn. Please verify which information is correct and consider the previous comment to specify the version and parameters.

      Thank you catching the discrepancy. We have corrected the text:

      “PVL was detected using nucleotide BLAST.”

      Line 418. Please provide the column identifiers for the Supplementary Table 5 (PVL worksheet).

      Column names are added.

      Line 418. Please remove the repeated word "and" in Supplementary Table 5 (mecA worksheet) and italicize the gene names in this table.

      Corrected

      Line 419. You can use the abbreviation "SNPs" since it was introduced in line 65.

      Corrected.

      Line 420. In my view, this analysis could benefit from a more detailed and clearer explanation.

      We have added to the explanation. The section now reads:

      “To find the root of the USA300 strains in the phylogenetic tree, the genomes in the tree were first annotated by their PVL and SCC_mec_ status. Then the tree traversed from leaf to root starting from known USA300 strains – TCH1516 and FPR3757- while keeping track of the number of descendant genomes from the current root that contained known markers SCC_mec_ IVa and PVL. The node where the number of genomes with the markers started flatlining was marked as the root of USA300.”

      Line 428. Specify the version and parameters used in the analysis with DBGWAS.

      Added. The text now reads:

      “DBGWAS (v0.5.4) was used to enrich mutations unique to USA300 strains using default kmer size of 31 (-k 31) and neighborhood size of 5 (-nh 5). Alleles with frequency less than 0.1 were filtered  (-maf 0.1) and all components enriched with q-values less than 0.05 were documented (-SFF q0.05).”

      Line 431. What tools were employed to calculate Pearson correlation and distances relative to the reference genome?

      Added. The text now reads:

      “Genome-wide linkage was estimated by Pearson correlation (calculated with built-in Pandas function) of the presence/ absence of enriched kmers and distance was measured based on the kmer alignment to the reference TCH1516 genome as determined by BLASTn.”

      Line 450. What type of BLAST was used?

      Added. Nucleotide blast was used for all kmer analysis.

      Line 452. I didn't quite understand the reason for making this analysis available in a separate repository. It would be easier for readers looking to reproduce the work if all the codes were in a single repository.

      We kept the repository separate in case we wanted to further develop the network analysis code in the future. We have added the link to the network analysis repository in the README of the publication repo.

      Line 460. Please specify the version and parameters, if run locally, or indicate if a web page was used.

      Corrected to indicate that we used the PATRIC website for this

      Line 470. Specify the version and provide a detailed account of all parameters used, along with the QC/QA criteria and curation methods applied.

      We have added Supplementary Note 2 with all the details about packages and parameters used to calculate the iModulons.

      Line 479. The phrase "ICA was then run as previously described in chapter 3" does not make sense. Please clarify.

      We have corrected the mistake and added a new supplementary note with details about our ICA run. The line now reads:

      “A detailed version of the methods for RNA-sequencing and ICA analysis is available as Supplementary Note 2. ICA of RNA sequencing data was performed using the pymodulon package.”

      Line 484. Specify the version of CD-HIT.

      Added. The version used was v4.6.

      Line 494. To enable reproducibility, the repository should be better organized, especially the directory containing the code. Numbering each script in the order it was run would assist the reader in comprehending the overall analysis flow and adapting it to their needs. If creating a manual for method usage is not feasible, the code could be more extensively commented on to explain the parameters, choices made, and how these could be modified. The "Data" folder seems to contain some test files, such as those in the "isdh_fimo" folder, so removing test files would aid the understanding of the reader.

      Thank you for the suggestions. We have now numbered the notebooks that generate the figures, we have added more comments to the code, removed testing code and test datasets.

      Throughout the article, please correct "SCCMec" to "SCCmec" (italicize "mec").

      Corrected.

    1. Author response:

      (1) The manuscript emphasizes the hypothesis that stable super-complexes, maintained through sequential replacement of subunits, might underlie the long-term storage of memory. While an interesting idea, this notion requires considerably more research. The presented experimental data are indeed consistent with this notion, but there is no evidence that these complexes are causally related to memory storage. 

      We agree with the reviewer that, while our data support the idea that subunit exchange in supercomplexes could underlie long-term memory storage, more research is necessary to conclusively validate this hypothesis. The experimental data presented are consistent with the idea that stable supercomplexes, maintained through sequential replacement of subunits, play a role in memory retention. However, establishing a causal relationship between these supercomplexes and memory storage will require additional experiments and in-depth analyses.

      (2) Much of the presented work is performed on biochemically isolated protein complexes. The biochemical isolation procedures rely on physical disruption and detergents that are known to alter the composition and structure of complexes in certain cases. Thus, it remains unclear how the protein complexes described in this study relate to PSD95 complexes in intact synapses. 

      Whilst it could be the case that biochemical isolation procedures have the potential to alter the composition and structure of protein complexes, we have previously published the protocol used to isolate PSD95-containing supercomplexes (Nat Commun. 2016; 7: 11264). In that study, we demonstrated that the isolated supercomplexes are approximately 1.5 MDa in size and contain multiple proteins, including other scaffolding proteins (e.g., PSD93) and receptors (e.g., NMDARs). Importantly, these supercomplexes remain stable when exposed to detergents and dilution, strongly indicating that they represent the native complexes present in intact synapses.

      (3) Because not all GFP molecules mature and fold correctly in vitro and the PSD95-mEos mice used were heterozygous, the interpretation of the corresponding quantifications is not straightforward. 

      Although genetic tagging ensures a 1:1 labeling stoichiometry, we acknowledge that the presence of unfolded GFP and the use of heterozygous PSD95-mEos mice can complicate the analysis. We have highlighted this limitation in the manuscript. Nonetheless, our results show a high level of consistency across the different genetic fusions used in this study.

      (4) It was not tested whether different numbers of PSD95 molecules per super-complex might contribute to different retention times of PSD95, e.g. in synaptic vs. total-forebrain super-complexes. 

      The potential impact of varying numbers of PSD95 molecules per super-complex on retention times was considered. However, our analysis showed minimal differences in the distribution of molecule numbers per super-complex between the synaptic and forebrain samples.

      (5) The conclusion that the population of 'mixed' synapses is higher in the isocortex than in other brain regions is not supported by statistical analysis. 

      The conclusion that the population of 'mixed' synapses is higher in the isocortex than in other brain regions is indeed supported by statistical analysis. All relevant statistical data are detailed in Table S2, and the finding is statistically significant. We will emphasize this point in the revised manuscript.

      (6) The validity of conclusions regarding PSD95 degradation based on relative changes in the occurrence of SiR-Halo-positive puncta is limited.

      We recognize that conclusions based solely on the relative changes in SiR-Halo-positive puncta concerning PSD95 degradation have limitations. To address this, we also quantified the “new” PSD95 by analyzing AF488-Halo-positive molecules.

    1. Author Response:

      Thank you for the reviews and the eLife assessment. We want to take this opportunity to acknowledge the weaknesses pointed out by the reviewers and we will make small changes to the manuscript to account for these as part of the Version of Record.

      The tools are command-based and store outcomes locally

      We consider this to be an advantage of our ecosystem, which is intended for the case of individuals or small groups of authors. These features facilitate easy installation and integration with other tools. Further, our tool labelbuddy is a graphic user interface. Our tools may also be integrated into web-based systems as backends. Pubget is already being used in this way in the NeuroSynth Compose platform for semi-automated neuroimaging meta-analyses.

      pubget only gathers open-access papers from PubMed Central

      We recognize this as a limitation, and we acknowledge it in the original manuscript (in the discussion section, starting with "A limitation of Pubget is that it is restricted to the Open-Access subset of PMC"). We chose to limit the scope of our tools in order to ensure maintainability. Further, we are currently expanding pubget so it will also be able to access the abstracts and meta-data from closed-access papers indexed on PubMed. Future research could build other tools to work alongside pubget, to access other databases.

      Logic flow is difficult to follow

      We thank the reviewer for this feedback. Our paper describes an ecosystem of literature mining tools which does not lend itself to narrative flow nor does readily fit into the standard "Intro, Results, Discussion, Methods" structure that is typical in the scientific literature. We have done our best to conform to this expected format, but we have also provided detailed section and subsection headings to enable the reader to digest the paper nonlinearly. Each of the tools we describe also has detailed documentation on github that we update continuously.

      Results were not validated

      For the example where we automatically extracted participant demographics from papers, we validated the results on a held-out dataset of 100 manually-annotated papers. For the example with automatic large-scale meta-analyses (neuroquery and neurosynth), these methods are described together with their validation in the original papers. If this ecosystem of tools is integrated into other workflows, it should be validated in those contexts. We recognize that validating meta-analyses is a difficult problem because we do not have ground truth maps of the brain.

      Efficiency was not quantified

      Creators of tools do not always do experiments to quantify their efficiency and other qualities. We have chosen not to do this here, first because it is outside the scope of this paper as it would necessitate to specify very precise tasks and how efficiency is measured, and second because at least for the data collection part, the benefit of using an automated tool over manually downloading papers one by one is clear even without quantifying it. Compared to the approach of re-using existing datasets, our ecosystem is not necessarily more or less efficient. But it has other advantages, such as providing datasets that contain the latest literature, whereas the existing datasets are static and quickly out-of-date.

      We do not highlight the strength of AI functions

      We provide an example of using our tools to gather data and manually annotate a validation set for use with large language models (in our case, GPT). We are further exploring this domain in other projects; for example, for performing semi-automated meta-analyses using the NeuroSynth Compose platform. However, we did not deem it necessary to include more AI examples in the current paper; we only wanted to provide enough examples to demonstrate the scope of possible use cases of our ecosystem.

      We thank the reviewers for their time and valuable feedback, which we will keep in mind in our future research.

    1. Author response:

      Thank you for handling our paper and our thanks to the reviewers for their engagement, comments and valuable suggestions. We will take the opportunity to provide a full response and submit a revised version in the coming weeks.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The authors provide solid data on a functional investigation of potential nucleoid-associated proteins and the modulation of chromosomal conformation in a model cyanobacterium. While the experiments presented are convincing, the manuscript could benefit from restructuring towards the precise findings; alternatively, additional data buttressing the claims made would significantly enhance the study. These valuable findings will be of interest to the chromosome and microbiology fields.

      We appreciate editors for taking time for assessment and reviewers for giving critical suggestions. Both reviewers were concerned about our interpretation of 3C data, and Reviewer #2 suggested the biochemistry of cyAbrB2 to reinforce our claim. We agree with the concern and suggest editors add a sentence “How cyAbrB2 affects chromosome structure is still elusive from this study, and the biochemical assays are needed in the future experiment.” to the eLife assessment.

      The major revision points are the following;

      Reconstruction of Figures

      Previous Figure 5E has been omitted

      Additional 3C data on the nifJ region

      Rephrasing the conclusion of 3C data

      Additional discussion on cyAbrB2 and NAPs

      Reviewer #1 (Public Review): 

      Strength: 

      At first glance, I had a very positive impression of the overall manuscript. The experiments were well done, the data presentation looks very structured, and the text reads well in principle.

      Weakness: 

      Having a closer look, the red line of the manuscript is somewhat blurry. Reading the abstract, the introduction, and parts of the discussion, it is not really clear what the authors exactly aim to target. Is it the regulation of fermentation in cyanobacteria because it is under-investigated? Is it to bring light to the transcriptional regulation of hydrogenase genes? The regulation by SigE? Or is it to get insight into the real function of cyAbrB2 in cyanobacteria? All of this would be good of course. But it appears that the authors try to integrate all these aspects, which in the end is a little bit counterintuitive and in some places even confusing. From my point of view, the major story is a functional investigation of the presumable transcriptional regulator cyAbrB2, which turned out to be a potential NAP. To demonstrate/prove this, the hox genes have been chosen as an example due to the fact that a regulatory role of cyAbrB2 has already been described. In my eyes, it would be good to restructure or streamline the introduction according to this major outcome. 

      As you pointed out, the major focus of this study is cyAbrB2 as a potential NAPs. To focus on NAPs, we simplified the first paragraph of the discussion (ll.246-263) and added the section comparing cyAbrB2 with other known NAPs (11.269-299). To emphasize the description of cyAbrB2, we also rearranged the figures and divided the analysis on cyAbrB2 ChIP into two figures. We reduced the first paragraph of the introduction but mostly preserved the composition of the introduction to keep the general to specific pattern, even though the manuscript is blurry.

      Points to consider: 

      The authors suggest that the microoxic condition is the reason for the downregulation of e.g. photosynthesis (l.112-114). But of course, they also switched off the light to achieve a microoxic environment, which presumably is the trigger signal for photosynthesis-related genes. I suggest avoiding making causal conclusions exclusively related to oxygen and recommend rephrasing (for example, "were downregulated under the conditions applied").

      We agree with this point. We rephrased l.114 to “by the transition to dark microoxic conditions from light aerobic conditions” (ll.108-109).

      The authors hypothesized that cyAbrB2 modulates chromosomal conformation and conducted a 3C analysis. But if I read the data in Figure 5B & C correctly, there is a lot of interaction in a range of 1650 and 1700 kb, not only at marked positions c and j. Positions c and j have been picked because it appears that cyAbrB2 deletion impacts this particular interaction. But is it really significant? In the case of position j the variation between the replicates seems quite high, in the case of position c the mean difference is not that high. Moreover, does all this correlate with cyAbrB2 binding, i.e. with positions of gray bars in panel A? If this was the case, the data obtained for the cyabrB2 mutant should look totally different but they are quite similar to WT. That's why the sentence "By contrast, the interaction frequency in Δcyabrb2 mutant were low and unchanged in the aerobic and microoxic conditions" does not fit to the data shown. But I have to mention that I am not an expert in these kinds of assays. Nevertheless, if there is a biological function that shall be revealed by an experiment, the data must be crystal clear on that. At least the descriptions of the 3C data and the corresponding conclusions need to be improved. For me, it is hard to follow the authors' thoughts in this context. 

      According to your suggestion, we again have carefully observed the 3C data. Furthermore, we conducted an additional 3C experiment on nifJ region (Figures 7F-J). Then we admit we had overinterpreted the 3C data. Therefore, we rewrote the result and discussion of the 3C assay in line with the data (ll.220-245) and removed the previous Figure 5E. Following are individual responses.

      Positions c and j have been picked because it appears that cyAbrB2 deletion impacts this particular interaction. But is it really significant?

      We could not find statistically significant differences at locus c and j. Therefore, we added this in the result section “Note that the interaction scores exhibit considerable variability and we could not detect statistical significance at those loci.” (ll.231-232)

      does all this correlate with cyAbrB2 binding, i.e. with positions of gray bars in panel A?

      As you are concerned, interaction frequency and cyAbrB2 binding do not correlate. Therefore, we withdraw the previous claim and stated as follows; “Moreover, our 3C data did not support bridging at least in hox region and nifJ region, as the high interaction locus and cyAbrB2 binding region did not seem to correlate (Figure 7).” (ll.280-282)

      If this was the case, the data obtained for the cyabrB2 mutant should look totally different but they are quite similar to WT.

      We rewrote it as follows; “Then we compared the chromatin conformation of wildtype and cyabrb2∆. Although overall shapes of graphs did not differ, some differences were observed in wildtype and cyabrb2∆ (Figures 7B and 7G); interaction of locus (c) with hox region were slightly lower in cyabrb2∆ and interaction of loci (f’) and (g’) with nifJ region were different in wildtype and cyabrb2∆. Note that the interaction scores exhibit considerable variability and we could not detect statistical significance at those loci.” (ll.228-232)

      That's why the sentence "By contrast, the interaction frequency in Δcyabrb2 mutant were low and unchanged in the aerobic and microoxic conditions" does not fit to the data shown.

      We rewrote the sentence as follow; “While the interaction scores exhibit considerable variability, the individual data over time demonstrate declining trends of the wildtype at locus (c) and (j) (Figure S8). In ∆cyabrb2, by contrast, the interaction frequency of loci (c) and (j) was unchanged in the aerobic and microoxic conditions (Figure 7E). The interaction frequency of locus (c) in ∆cyabrb2 was as low as that in the microoxic condition of wildtype, while that of locus (j) in ∆cyabrb2 was as high as that in the aerobic condition of wildtype (Figures 7B and 7C).” (ll.238-243)

      The figures are nicely prepared, albeit quite complex and in some cases not really supportive of the understanding of the results description. Moreover, they show a rather loose organization that sometimes does not fit the red line of the results section. For example, Figure 1D is not mentioned in the paragraph that refers to several other panels of the same figure (see lines110-128). Panel 1D is mentioned later in the discussion. Does 1D really fit into Figure 1 then? Are all the panels indeed required to be shown in the main document? As some elements are only briefly mentioned, the authors might also consider moving some into the supplement (e.g. left part of Figure 1C, Figure 2A, Figure 3B ...) or at least try to distribute some panels into more figures. This would reduce complexity and increase comprehensibility for future readers. Also, Figure 3 is a way too complex. Panel G could be an alone-standing figure. The latter would also allow for an increase in font sizes or to show ChIP data of both conditions (L+O2 and D-O2) separately. Moreover, a figure legend typically introduces the content as a whole by one phrase but here only the different panels are described, which fits to the impression that all the different panels are not well connected. Of course, it is the decision of the authors what to present and how but may they consider restructuring and simplifying.

      According to the advice, we have rearranged the Figure composition.

      The left side of Figure 1C has been moved to supplement. Instead, representative expression fold changes of “Transient”, “Plateau”, “Continuous”, and “Late” genes are shown for comprehensibility. We left Figure 1D in Figure 1, as this diagram shows our motive to focus on hox and nifJ. We moved Figure 2A to supplement. We did not move Fig3B, as this figure shows the distribution of cyAbrB2 (“long tract of AT-rich DNA”) comprehensively and simply. We agree that Figure 3 was too complex. Therefore, we moved Figures 3F and 3G to a new independent figure (Figure 4). In Figure 4C (former 3G), we show the ChIP data of the L+O2 condition only, and the change of ChIP data under the D-O2 condition is shown in Figure 5. The schematic image showing cyanobacterial chromosome and NAPs (previous Figure 5E) was omitted because it was overinterpreting.

      The authors assume a physiological significance of transient upregulation of e.g. hox genes under microoxic conditions. But does the hydrogenase indeed produce hydrogen under the conditions investigated and is this even required? Moreover, the authors use the term "fermentative gene". But is hydrogen indeed a fermentation product, i.e. are protons the terminal electron acceptor to achieve catabolic electron balance? Then huge amounts of hydrogen should be released. Comment should be made on this.

      This is a very important point; Yes, hydrogenase indeed produces hydrogen under the conditions we investigated, and proton accepts a majority of reducing power under the dark microoxic condition. We wrote in the introduction section as follows; “Hydrogen is generated in quantities comparable to lactate and dicarboxylic acids as the result of electron acceptance in the dark microoxic condition (Akiyama and Osanai 2023; Iijima et al. 2016)” (ll.54-55). The detailed explanation is below, although omitted from the manuscript.

      A recent study (Akiyama and Oasanai 2023) quantified the consumed glycogen and secreted fermentative products (hydrogen, lactate, dicarboxylic acid, and acetate) in the Synechocystis under the dark microoxic condition, the same conditions as we investigated. The system of the study consists of a 10 mL liquid layer and a 10 mL gas layer, cultivated for 3 days under dark microoxic conditions. Then the amounts of lactic acid, dicarboxylic acid, and hydrogen were approximately 2 µmol, 3.5 µmol, and 11µmol (assuming the gas layer was at 1 atm and ignoring aqueous population), respectively. On the other hand, glycogen equivalent to 15µmol of glucose was consumed in the system. This estimate supports hydrogen accounts for a substantial portion of fermentative products during dark microoxic conditions.

      The necessity of hydrogen production under dark microoxic conditions was demonstrated in (Gutekunst et al. 2014). They show hydrogenase activity is required for the mixotrophic growth in the light-dark and microoxic cycle with arginine. The necessity remains unclear in our conditions because we only performed continuous dark microoxic conditions without glucose.

      The authors also mention a reverse TCA cycle. But is its existence an assumption or indeed active in cyanobacteria, i.e. is it experimentally proven? The authors are a little bit vague in this regard (see lines 241-246).

      We misused the Terminology. We mean to mention the “reductive branch of TCA”. Cyanobacteria conduct the branched TCA cycle under microoxic conditions. One of the branches is the reductive branch, which reduces oxaloacetate to produce malate. We corrected “reverse TCA cycle” to “reductive branch of TCA”. (Figure 1D and ll.260-262)

      Reviewer #2 (Public Review): 

      This work probes the control of the hox operon in the cyanobacterium Synechocystis, where this operon directs the synthesis of a bidirectional hydrogenase that functions to produce hydrogen. In assessing the control of the hox system, the authors focused on the relative contributions of cyAbrB2, alongside SigE (and to a lesser extent, SigA and cyAbrB1) under both aerobic and microoxic conditions. In mapping the binding sites of these different proteins, they discovered that cyAbrB2 bound many sites throughout the chromosome repressed many of its target genes, and preferentially bound regions that were (relatively) rich in AT-residues. These characteristics led the authors to consider that cyAbrB2 may function as a nucleoid-associated protein (NAP) in Synechocystis, given its functional similarities with other NAPs like H-NS. They assessed the local chromosome conformation in both wild-type and cyabrB2 mutant strains at multiple sites within a 40 kb window on either side of the hox locus, using a region within the hox operon as bait. They concluded that cyAbrB2 functions as a nucleoid-associated protein that influences the activity of SigE through its modulation of chromosome architecture.

      The authors approached their experiments carefully, and the data were generally very clearly presented and described.

      Based on the data presented, the authors make a strong case for cyAbrB2 as a nucleoid-associated protein, given the multiple ways in which it seems to function similarly to the well-studied Escherichia coli H-NS protein. It would be helpful to provide some additional commentary within the discussion around the similarities and differences of cyAbrB2 to other nucleoid-associated proteins, and possible mechanisms of cyAbrB2 control (post-translational modification; protein-protein interactions; etc.). The manuscript would also be strengthened with the inclusion of biochemical experiments probing the binding of cyAbrB2, particularly focusing on its oligomerization and DNA polymerization/bridging potential.

      We agree with the comment that the biochemical experiments will deepen our insights into the cyAbrB2 and chromatin conformation. As the reviewer pointed out, the biochemical assay will provide valuable information on mechanisms of cyAbrB2 control, such as post-transcriptional modification, cooperation with cyAbrB1, oligomerization, and the structure of cyAbrB2-bound DNA. However, we think those potential findings are worth of new independent research paper, rather than a part of this paper. Therefore, we added a discussion mentioning biochemistry as the future work (ll.275-290; the section of “The biochemistry of cyAbrB2 will shed light on the regulation of chromatin conformation in the future”).

      Previous work had revealed a role for SigE in the control of hox cluster expression, which nicely justified its inclusion (and focus) in this study. However, the results of the SigA studies here suggested that SigA both strongly associated with the hox promoter, and its binding sites were shared more frequently than SigE with cyAbrB2. The focus on cyAbrB2 is also well-justified, given previous reports of its control of hox expression; however, it shares binding sites with an essential homologue cyAbrB1. Interestingly, while the B1 protein appears to bind similar sites, instead of repressing hox expression, it is known as an activator of this operon. It seems important to consider how cyAbrB1 activity might influence the results described here.

      We infer that the minor side of the bimodal SigE peak is the genuine population that contributes to hox transcription, as hox genes are expressed in a SigE-dependent manner (Figure S2). We considered the strong SigA peak upstream of the hox operon binds the promoter of TU1715, the opposite direction of the hox operon. We added a description of the single SigA peak and bimodal SigE peak near the TSS of the hox operon as follows;

      “A bimodal peak of SigE was observed at the TSS of the hox operon in a microoxic-specific manner (Figure 6C bottom panel). The downstream side of the bimodal SigE peak coincides with SigA peak and the TSS of TU1715. Another side of the bimodal peak lacked SigA binding and was located at the TSS of the hox operon (marked with an arrow in Figure 6C), although the peak caller failed to recognize it as a peak.” (ll.206-209)

      The point that cyAbrB1 binds similar sites as cyAbrB2, despite regulating hox expression in the opposite direction, is very interesting. Therefore, we referred to the transcriptome data of the cyAbrB1 knockdown strain and compared the impact of cyAbrB1 knockdown and cyAbrB2 deletion. We described in result and discussion as follows;

      “we referred to the recent study performing transcriptome of cyAbrB1 knockdown strain, whose cyAbrB1 protein amount drops by half (Hishida et al. 2024). Among 24 genes induced by cyAbrB1 knockdown, 12 genes are differentially downregulated genes in cyabrb2∆ in our study (Figure S5D).” (ll.162-165)

      “CyAbrB1, the homolog of cyAbrB2, may cooperatively work, as cyAbrB1 directly interacts with cyAbrB2 (Yamauchi et al. 2011), their distribution is similar, and they partially share their target genes for suppression (Figures 3A S5C and S5D). The possibility of cooperation would be examined by the electrophoretic mobility shift assay of cyAbrB1 and cyAbrB2 as a complex. Despite their similar repressive function, cyAbrB1 and cyAbrB2 regulate hox expression in the opposite directions, and their mechanism remains elusive.” (ll.292-296)

      Hox operon differs from this general tendency. To see if cyAbrB1 behaves differently from cyAbrB2 in the hox operon, we did an additional ChIP-qPCR experiment on cyAbrB1 in the aerobic condition and the dark microoxic condition (Figure 5C). However, we could not find the difference.

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1B: I recommend changing the header in the grey bar to terms like "upregulated" and "downregulated", which are also used in the legend description. Upregulation of genes can also be a result of de-repression, which is why the term "activated" is somewhat misleading.

      Corrected.

      Lines 114-116: It is unclear what the authors exactly mean here. Please clarify. 

      We rephrase the sentence “The enrichment in the butanoate metabolism pathway indicates the upregulation of genes involved in carbohydrate metabolism. We further classified genes according to their expression dynamics.” (ll.110-111)

      Reviewer #3 (Recommendations For The Authors): 

      Major/experimental comments: 

      (1) For the chromosome conformation capture experiments, it is indicated that these were conducted at aerobic (1hr) and microoxic (4 hr) conditions. But the data presented in Figure 1 suggest that 1 hr corresponds to the beginning of microoxic growth, and that time 0 is aerobic. The composite 3C data in Figure 5 show some interesting but specific differences. It is appreciated that the authors presented the profiles for individual samples in Figure S7, and the differences here do not seem to be as compelling. Are the major differences being highlighted significantly (statistically) different (e.g. at the (c) and (j) loci)? Might the differences be starker if an earlier aerobic condition (e.g. time 0) had been used instead of the 1 hr - microoxic - timepoint?

      Previous Figure 5 consisted of three time points (solid line: aerobic condition, dashed line:1hr of microoxic condition, and dotty line:4hr of microoxic condition). We omitted data of 4hr in the main figure (Figure 7) as 4hr in microoxic conditions makes data complicated. Three time points are shown in the profiles of individual loci (Figure S8).

      There is no statistical significance found in (c) and (j) loci by t-test. Therefore, we have toned down the interpretation of 3C data as follows; “Our 3C result demonstrated that cyAbrB2 influences the chromosomal conformation of hox and nifJ region to some extent (Figure 7).” (ll.325-326)

      (2) This is a complicated system that involves multiple regulatory proteins, each of which is differentially affected by the growth conditions (aerobic/microoxic). It is obviously beyond the scope of this work to probe deeply into all of these proteins. The focus here was on cyAbrB2, and to a slightly lesser extent SigE; however, based on the data presented, it seems that SigA and cyAbrB1 may be equally important contributors to hox control/expression, and in the case of cyAbrB1, possibly also to chromosome conformation. cyAbrB1 appears to have the same binding sites as cyAbrB2, and has been reported to interact with cyAbrB2. Given this association, it is possible that the two proteins may affect the binding of each other, and that loss of one might lead to enhanced binding by the other (or binding may require heterooligomerization?). Probing the regulatory interplay between these two proteins (or at least discussing it) feels important. Conducting e.g. mobility shift assays with each protein, both individually and together, could possibly allow for some understanding of how they function together. 

      We agree that the biochemistry of cyAbrB2 and cyAbrB1 may explain why cyAbrB1 and cyAbrB2 bind long tracts of AT-rich genome regions in vitro. We would like to put the biochemistry future plan as we think biochemistry data is beyond the present study.

      The idea that cyAbrB1 and cyAbrB2 cooperate to form heterooligomers and broad binding to the genome is a very rational and interesting prediction. We add this idea to the discussion “Overall, the biochemistry integrating assay conditions (PTM, buffer condition, and cooperation with cyAbrB1) and output (DNA binding, oligomerization, and DNA structure) will deepen the understanding of cyAbrB2 as cyanobacterial NAPs.”(ll.287-290). We also compared our transcriptome of ∆_cyabrb2 with the recent study of cyabrb1 knockdown (ll. 162-165), and concluded “they partially share their target genes for suppression (Figures 3A S5C and S5D)” (l. 293).

      (3) Throughout the manuscript, there is reference made to cyAbrB2 binding becoming 'blurry' or non-specific under microoxic conditions. It is not clear what this means. It appears that when cyAbrB2 binds, any given protected region can be quite extensive, which can be suggestive of polymerization along the chromosome. Are the boundaries for binding sites typically clearly delineated, and this changes when the cultures are growing under microoxic conditions? There is also no mention made anywhere about oligomerization potential for cyAbrB2, which would be important for the polymerization, and bridging suggested for cyAbrB2 in the model presented in Figure 5. Previous publications (Song et al., 2022; Ishi et al., 2008) have suggested that it can exist as a dimer in vivo, but that in vitro it is largely monomeric. The manuscript would benefit from some additional biochemical analyses of cyAbrB2 binding activity, with a particular focus on DNA binding and oligomerization/bridging potential, and some additional discussion about these characteristics as well. 

      Throughout the manuscript, there is reference made to cyAbrB2 binding becoming 'blurry' or non-specific under microoxic conditions. It is not clear what this means.

      In order to clearly describe “cyAbrB2 binding becomes blurry”, we rearranged the figure composition and made an exclusive figure (Figure 5). We also rephrased the description by adopting the reviewer’s word “boundaries for binding sites”, as this phrase well describes the change. “When cells entered microoxic conditions, the boundaries of the cyAbrB2 binding region and cyAbrB2-free region became obscure (Figure 5), “(ll.319-320)

      There is also no mention made anywhere about oligomerization potential for cyAbrB2,

      We added the discussion about oligomerization “DNA-bound cyAbrB2 is expected to oligomerize, based on the long tract of cyAbrB2 binding region in our ChIP-seq data. However, no biochemical data mentioned the DNA deforming function or oligomerization of cyAbrB2 in the previous studies and preference for AT-rich DNA is not fully demonstrated in vitro (Dutheil et al. 2012; Ishii and Hihara 2008; Song et al. 2022)”(ll. 277-280) and “Overall, the biochemistry integrating assay conditions (PTM, buffer condition, and cooperation with cyAbrB1) and output (DNA binding, oligomerization, and DNA structure) will deepen the understanding of cyAbrB2 as cyanobacterial NAPs.” (ll.287-290)

      The manuscript would benefit from some additional biochemical analyses of cyAbrB2 binding activity, with a particular focus on DNA binding and oligomerization/bridging potential, and some additional discussion about these characteristics as well. 

      We added the discussion integrally considering known features of cyAbrB2, novel findings on cyAbrB2, and the comparison with known NAPs (ll.269-290).

      (4) Given that the major take-away for the authors (based on the title) seems to be the nucleoid-associated protein potential for cyAbrB2, the Discussion would benefit from some additional focus in this area. How similar is cyAbrB2 to other nucleoid-associated proteins? (e.g. H-NS, Lsr2) How does counter-silencing work for other nucleoid-associated proteins? Can the authors definitively exclude the possibility of binding site competition/occlusion, given that cyAbrB2 covers the promoter region of hox? What is other nucleoid-associated proteins have been characterized in the cyanobacteria? 

      We agree with the point, so we additionally discussed cyAbrB2 comparing with H-NS and Lsr2, the canonical NAPs (ll. 269-290).

      We did not deny the possibility of the exclusion of RNAP by cyAbrB2, but the previous manuscript insufficiently discussed that. To emphasize that cyAbrB2 excludes RNA polymerase, we simplified Figure 6 and employed mosaic plots showing anti-co-occurrence of cyAbrB2 binding regions and SigE peaks. Furthermore, we added discussion about SigE exclusion by cyAbrB2 (ll. 355-359)

      We mention the possibility of other nucleoid-associated proteins in cyanobacteria in the discussion. “Furthermore, the conformational changes by deletion of cyAbrB2 were limited, suggesting there are potential NAPs in cyanobacteria yet to be characterized.” (ll.336-339)

      (5) Previous work (Song et al., 2022) showed that changing the AT content of cyAbrB2 binding sites did not affect its ability to bind DNA. There are also previous papers suggesting that cyAbrB2 may be subject to diverse post-translational modifications (e.g. phosphorylation - Spat et al., 2023; glutationylation - Sakr et al., 2013), as well as association with cyAbrB1. These collectively suggest there may be other factors that contribute to cyAbrB2 binding specificity/activity. These seem like relevant points to discuss, particularly given the transient nature of the cyAbrB2 effects on some genes.

      We have included the discussion about AT content, post-translational modifications and transient regulations, and association with cyAbrB1 (ll. 284-295)

      (6) Given the major binding site for SigA upstream of the hox operon, it seems that it likely also contributes to hox cluster expression, together with SigE. Is there a sense for the relative contribution of each sigma factor to hox cluster expression? And whether both are subject to the same inhibitory effect of cyAbrB2? 

      As described above response to the public review, the SigA binding site upstream of the hox operon should be assigned to the TSS of TU1715 (Figure 6C). Transcription of hox operon is highly dependent on SigE as shown in Figure S2, and residual transcription in sigE∆ strain is derived from other sigma factors (SigABCD). Estimating the relative contribution of sigma factors other than SigE is difficult at present because SigABCDE can partially compensate for each other.

      As the different impact of NAPs on the primary and alternative sigma factor is observed in H-NS (Shin et al. 2005), whether both the primary sigma factor (SigA) and the alternative sigma factor (SigE) are inhibited by cyAbrB2 to the same extent is a very interesting question.

      We calculated the odds ratio of SigE and SigA being in the cyAbrB2-free region and wrote in the result; “SigE preferred the cyAbrB2-free region in the aerobic condition more than SigA did (Odds ratios of SigE and SigA being in the cyAbrB2-free region were 4.88 and 2.74, respectively).” (ll.193-195) and discussed “The higher exclusion pressure of cyAbrB2 on SigE may contribute to sharpening the transcriptional response of hox and nifJ on entry to microoxic conditions.” (ll.357-359)

      (7) The 3C experiments suggest there are indeed changes in chromosome architecture in the hox region as growth conditions change and when different regulators are present. Across the chromosome, analogous changes are expected; however, it may be premature to draw this conclusion based on changes at one locus. Is there a reason that the authors did not take full advantage of their 3C samples and sequence them, to capture the full chromosome interactome at the two time-points? This would allow broader conclusions to be drawn regarding changes in chromosome structure and the impact of cyAbrB2.

      In response to the suggestion, we performed an additional 3C assay on the nifJ region by utilizing residual 3C samples. Expanding to genome-wide sequence (Hi-C) needs concentration of ligated fragments by the biotinylation, which were omitted in our 3C sample.

      We rewrote the result as obtained from the 3C data of hox and nifJ (ll.220-245) and omitted the schematic image of an entire chromosome of cyanobacteria (previous Figure 5E).

      Editorial comments: 

      (1) The data presentation in Figure 1 is very effective. 

      (2) Line 87: please rephrase - you can have 'high similarity' or 'high levels of identity', but not high levels of homology - genes/proteins are either homologous or not.

      (3) Line 118: classified into four 'groups'? 

      (4) Line 590: remove 'the'. 

      (5) Figure 2S, panel B: please define acronyms in the legend (GT, IP) and write out 'FLAG' in full for AbrB1.

      (2) to (5) have been corrected.

      (6) Please provide information on or a reference for the tagging of SigA for use in the ChIP-seq experiments within the Materials and Methods.

      Added (l.365)

      (7) Line 648: space between 'binding' and 'regions'. 

      corrected.

      (8) Fig 4E: please make the solid lines thicker - they are currently difficult to see.

      We have made Figure 6C (former 4E) larger and the line thicker.

      (9) Line 666: location. 

      (10) Line 673: Individual. 

      (11) Figure S5, panel C graph title: should this be 'Relative'? 

      (12) Figure S7: What is 'GT'? Should this be 'WT'? 

      (9) to (12) have been corrected.

      (13) In addition to the data presented in Figure 3G, it would be nice to have a small table or Venn diagram summarizing the number of cyAbrB2 binding sites that fall into the different categories (full gene/operon; downstream of a gene; within a gene; promoter region). 

      In response to the comment, we noticed the categories we had applied (full gene/operon; downstream of a gene; within a gene; promoter region) were arbitrary. Therefore, we categorized transcriptional units (TUs) according to the extent of occupancy by cyAbrB2. (Figures 4B and 4C)

      (14) Line 280-281: suggest replacing 'mediates' with 'influences'. 'Mediates' sounds like a direct interaction (for which the evidence is not currently strong without some additional biochemical data), but 'influences' could better accommodate both direct and indirect possibilities. 

      (15) Line 410: it is not clear what this means. 

      We have omitted “As a result, DNA ~600-fold condensed DNA than 3C samples were ligated.”, as it does not give any information about the experimental procedure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors provided a detailed analysis of the real-time structural changes in actin filaments resulting from cofilin binding, using High-Speed Atomic Force Microscopy (HSAFM). The cofilin family controls the lifespan of actin filaments in the cells by severing the filament and promoting depolymerization. Understanding the effects of cofilin on actin filament structure is critical. It is widely acknowledged that cofilin binding significantly shortens the pitch of the actin helix. The authors previously reported (1) that this shortening extends to the unbound region of the actin filament on the pointed end side of the cluster. In this study, the authors presented substantially improved AFM images and provide detailed accounts of the dynamics observed. It was found that a minimal cofilin-binding cluster, consisting of 2-4 molecules, could induce changes in the helical parameters over one or more actin crossover repeats. Adjacent to the cofilin-binding clusters, the actin crossovers were observed to shortened within seconds, and this shortening was limited to one side of the cluster. Additionally, the phosphate binding to the actin filament was observed to stabilize the helical twist, suggesting a mechanism in which cofilin preferentially binds to ADP-bound actin filaments. These findings significantly advance our understanding of actin filament dynamics which is essential for a wide of cellular processes.<br /> However, I propose that the sections about MAD and certain parts of the discussions need substantial revisions.

      In this study, we leverage high spatiotemporal resolutions of high-speed atomic force microscopy (HS-AFM) to analyze real-time structural changes in actin filaments induced by cofilin binding. Furthermore, we experimentally demonstrate the inherent variability in twist conformations of bare actin filaments. Our study integrates HS-AFM with Principal Component Analysis (PCA) to elucidate the actin structure-dependent preferential cooperative binding of cofilin. We provide experimental evidence to substantiate a "proof of principle" regarding the flexible helical twists of actin filaments that regulate the functions of actin-binding proteins. This important study enhances our understanding of actin filaments’ dynamics and polymorphic structures which play crucial roles in a broad spectrum of cellular activities.

      We appreciate the comments from Reviewer 1. Below, we address their concerns point by point.

      MAD analysis

      The authors have presented findings that the mean axial distance (MAD) within actin filaments exhibits a significant dependency on the helical twist, a conclusion not previously derived despite extensive analyses through electron microscopy (EM) and molecular dynamics (MD) simulations. Notably, the MAD values span from 4.5 nm (8.5 pairs per half helical pitch, HHP) to 6.5 nm (4.5 pairs/HHP) as depicted in Figure 3C. The inner domain (ID) of actin remains very similar across C, G, and F forms (2, 3), maintaining similar ID-ID interactions in both cofilactin and bare actin filaments, keeping the identical axial distance between subunits in the both states. This suggests that the ID is unlikely to undergo significant structural changes, even with fluctuations in the filament's twist, keeping the ID-ID interactions and the axial distances. The broad range of MAD values reported poses a challenge for explanation. A careful reassessment of the MAD analysis is recommended to ensure accuracy.

      The central challenge to study “Protein Dynamics” in real time lies in bridging the gap in time scales: HS-AFM captures dynamics of proteins within the milliseconds to seconds range, whereas molecular dynamics (MD) simulations typically operate within the femtoseconds to microseconds domain. Protein dynamics encompass a spectrum of temporal scales, from atomic vibrations to molecular tumbling and collective motions in simulations. HS-AFM stands out as a potent technique for delving into protein dynamics, including processes like protein folding and conformational changes triggered by drugs or protein interactions. Additionally, a significant limitation of MD simulation is the spatial modeling constraint (~50 x 50 nm unit), which restricts the study of large complex biological systems. However, utilizing HS-AFM enables the construction of intricate protein models facilitating the real time imaging of their structures and dynamics during functional activity.

      Regarding the suggestion about ID-ID interactions in both cofilactin and bare actin filaments, maintaining identical axial distances (ADs) between subunits in both states, our HS-AFM cannot provide atomic-level structural insights to address this issue. However, we demonstrate that the variability of OD twists in actin protomers could potentially lead to globally shorter half helical pitches (HHPs) and fewer protomer pairs per HHP (Figure 2, Figure supplement 2) (see lines 218-222). The fluctuation in filament’s twist is further supported by currently available experimental data, including our findings (Figure 3C) in this study (see our Discussion in lines 555-560).

      The minimal change in local ID-ID interactions results in an unchanged global length of actin filaments in both cofilin-bound and unbound cases (Figure supplement 2). However, filament’s twists, as experimentally detected by EM, high-resolution interferometric scattering microscopy (iSCAT), HS-AFM, and in pseudo AFM, are changeable (see lines 555-560).

      We have additionally reassessed the fluctuation and dynamics of MAD in F-ADP-actin and F-ADP.Pi-actin over time at high temporal resolution (Figure supplement 3, Video 3, Table supplement 5). These data are further explained in the Results section (lines 264-270).

      Furthermore, we reassessed the broad range of MAD values in F-ADP-actin segments on both sides of large cofilin clusters over time (Figure supplement 8, Video 5). These findings are explained in the Results section (lines 333-337) and further discussed in the new results (lines 555-560).

      In determining axial distances, the authors extracted measurements from filament line profiles. It is advised to account for potential anomalies such as missing peaks or pseudo peaks, which could arise from noise interference. An example includes the observation of three peaks in HHP6 of Figure Supplement 5C, corresponding to 4.5 pairs. Peak intervals measured from the graph were 5, 11.8, 8.7, and 5.7 nm. The second region (11.8 nm) appears excessively long. If one peak is hidden in the second region, the MAD becomes 5.5 nm.

      We acknowledge the difficulty in identifying peaks within the regions of bare actin segments adjacent to cofilin clusters or within the cofilactin region. In the revised Figure supplement 6C (originally Figure supplement 5C), we did not assess peak intervals as suggested by Reviewer 1. The measurement of axial distance (AD) and the number of peaks within a HHP to calculate the correct MAD is further detailed in the Methods section (see HS-AFM data analysis and processing, highlighted in purple).

      Additionally, the purpose of presenting these Figures supplement 6-7 is to directly compare the half helices and the number of protomer pairs per HHP between bare actin filaments and actin segments near the boundary between cofilactin and bare actin segments on the PE side in the same AFM images. In an original version of this paper, we have avoided including the MAD values measured in the cofilactin region (HHP6, HHP7) in Figure Supplement 7E, to mitigate the measurement errors.

      Compiling histograms of axial distances (ADs) rather than focusing solely on MAD may provide deeper insights. If the AD is too long or too short, the authors should suspect the presence of missing peaks or pseudo-peaks due to noise. If 4.4 or 5.5 pairs/HHP regions tend to contain missing peaks and 7.5-8.5 pairs/HHP regions tend to contain pseudo peaks, this may explain the MAD dependency on the helical twist.

      The measurement of axial distance (AD) and the number of peaks within a HHP to calculate the correct MAD is further detailed in the Methods section (see Analyses of pseudo AFM images of F-actin and C-actin structures constructed from existing PDB structures (e.g., Figure supplement 2); and HS-AFM data analysis and processing, highlighted in purple).

      We disagree with Reviewer 1’s suggestion that compiling histograms of ADs, rather than focusing solely on MAD, may provide deeper insights. AFM imaging provides only a 2-dimensional (2D) surface structure, unlike the 3-dimensional (3D) structure offered by Cryo-EM. In AFM imaging, we cannot capture the object from different angles as Cryo-EM does. Therefore, AD values measured in 2D AFM images do not accurately represent the axial distance between two adjacent protomers along the same actin filament. Consequently, we relied on MAD values. Our results, including the fluctuation in the number of protomer pairs per HHP, are further supported by other studies (see our Discussion in lines 555-560).

      Additionally, Figure 3E indicates a first decay constant of 0.14 seconds, substantially shorter than the frame rate (0.5 sec/frame). This suggests significant variations in line profiles between frames, attributable either to overly rapid dynamics or a low signal-to-noise ratio. Implementing running frame averages (of 2-3 frames) is recommended to distinguish between these scenarios. If the dynamics are indeed fast, the averaged frame's line profile may degrade, complicating peak identification. Conversely, if poor signal-to-noise ratio is the cause, averaging frames could facilitate peak detection. In the latter case, the authors can find the optimal number of frame averages and obtain better line profiles with fewer missing and pseudo-peaks.

      We utilized state-of-the-art HS-AFM with high temporal and spatial resolution to capture the dynamic structures of F-ADP-actin and F-ADP.Pi-actin segments at higher frame rate of 0.2 sec/frame and 0.1 sec/frame, respectively (Figure supplement 3). As suggested, we implemented running frame averages (3 frames) in the ACF analyses. Consistently, our results indicate that the first time constant (t1) remains around 0.1-0.4 seconds, independent of the imaging rates (0.1 – 0.5 sec/frame), for AD between two adjacent actin protomers in F-actin bound with ADP or ADP.Pi (Table Supplement 5), and in the similar range of (t1), shown in Figure 3E. These significant experimental results support the notion that helical twists, the number of actin protomers per HHP, and MAD in bare F-actin segments, are intrinsically dynamic and fluctuate around the mean values over time (see further in lines 264-270; 333-337; and 555-560). It should be noted that our original ACF analyses did not include the averaging of running frames, thus eliminating the possibility of low signal/noise ratio in our analysis, as shown in Figure 3E-F.

      Discussions

      The authors suggest a strong link between the C-form of actin and the formation of a short pitch helix. However, Oda et al. (3) have demonstrated that the C-form is highly unstable in the absence of cofilin binding, casting doubt on the possibility of the C-form propagating without cofilin binding. Moreover, in one strand of the cofilactin, interactions between actin subunits are limited to those between the inner domains (ID-ID interactions), which are quite similar to the interactions observed in bare actin filaments. This similarity implies that ID-ID interactions alone are insufficient to determine the helical parameters, suggesting that the presence of cofilin is essential for the formation of the short pitch helix in the cofilactin filament. Thus, crossover repeats are not necessarily shortened even if the actin form is C-form.

      We have experimentally observed a shortened bare half helix adjacent to cofilin clusters on the PE side at high spatial resolution, comprising fewer protomers than normal half helices. Thus, we hypothesized that crossover repeats are shortened if the actin protomers in the bare half helix neighboring the cofilin cluster on the PE side resembles a C-actin structure. This assumption is further explained by referring to C-actin structure in Figure 2 and Figure supplement 2. Even though the C-form, as suggested in Oda et al., 2019, is unstable, it intrinsically fluctuates around the mean value over time and adopts various conformations. A single PDB structure resolved by Cryo-EM through the ensembles of averaging structural images should be referenced as a single atomistic structure, one of many possible conformations, regardless it is resolved by Cryo-EM, X-ray diffraction or crystallography, or NMR (see Figure 1, legend of Figure supplement 1).

      We highlight two main points regarding this issue: (1) The short helical pitch at the global scale is associated with the twisting of the OD at the local scale for individual protomers; (2) Actins in different nucleotide or cofilin bound states exhibit varying ranges, distributions, spectra, variations of both local OD twist and global helical pitch (Figure 1-2, Figure supplement 1-2). The first point underscores that the twist/untwist of the OD determines the shortness of the helical pitches, rather than the ID-ID interactions. The latter point is more related to the global length of the filament. The minimal change in local ID-ID interactions results in an unchanged global length of actin filaments in both cofilin-bound and unbound cases (see pseudo AFM images in Figure supplement 2 for canonical actin filament and cofilactin segments with the same length (comprising 62 protomers). However, filament’s twists, as experimentally detected by EM, high-resolution interferometric scattering microscopy (iSCAT), HS-AFM, and in pseudo AFM, are changeable (see lines 555-560) and independent on the ID-ID interactions.

      Narita (4) proposes that the facilitation of cofilin binding may occur through a shortening in the helix pitch, independent of a change to the C-form of actin. Furthermore, the dissociation of the D-loop from an adjacent actin subunit leads directly to the transition of actin to the G-form, which is considered the most stable configuration for the actin molecule (3).

      See also our explanation above. We have incorporated these points in a Discussion section. See lines 497-499; 510-511.

      Furthermore, our PCA analysis indicates that the transition from C-actin to G-actin necessitates the opening of the nucleotide cleft (resulting in a decrease in PC1) and is more readily achieved than the direct transition from F-actin to G-actin (which requires decreases in both PC1 and PC2). Whether this transition is directly triggered by the dissociation of the D-loop remains a topic for our future investigations. Our PCA analysis reveals that the D-loop is deeply buried within the core of the filament (Figure 2). Further experiments will be conducted to elucidate its roles.

      The mechanism by which the shortened pitch propagates remains a critical and unresolved issue. It appears that this propagation is not a result of the C-form's propagation but likely involves an unidentified mechanism. Identifying and understanding this mechanism represents an essential direction for future research.

      It's worth mentioning that our HS-AFM data and spatial ACF analysis lend support to a hypothesis suggesting that 2-4 bare actin protomers adjacent to cofilin clusters on the PE side adopt C-actin-like structures. Additionally, we have proposed several hypotheses aimed at better understanding the mechanisms driving the unidirectional binding and expansion of cofilin clusters toward the PE side. These hypotheses will require further examination in future experiments. Additional information can be found in lines 328-329; 344-351; and 416-430.

      (1) K. X. Ngo et al., a, Cofilin-induced unidirectional cooperative conformational changes in actin filaments revealed by high-speed atomic force microscopy. eLife 4, (2015).<br /> (2) K. Tanaka et al., Structural basis for cofilin binding and actin filament disassembly. Nature communications 9, 1860 (2018).<br /> (3) T. Oda et al., Structural Polymorphism of Actin. Journal of molecular biology 431, 3217-3228 (2019).<br /> (4) A. Narita, ADF/cofilin regulation from a structural viewpoint. Journal of muscle research and cell motility 41, 141-151 (2020).

      We have cited them accordingly in the paper.

      Reviewer #2 (Public Review):

      Summary:

      This study by Ngo et al. uses mostly high-speed AFM to estimate conformational changes within actin filaments, as they get decorated by cofilin. The authors build on their earlier study (Ngo et al. eLife 2015) where they used the same technique to monitor the expansion of cofilin clusters on actin filaments, and the propagation of the associated conformational changes in the filament (reduction of the helical pitch). Here, they propose a higher-resolution description of the binding of cofilin to actin filaments.

      Strengths:

      The high speed AFM technique used here is quite original to address this question, compared to classical light and electron microscopy techniques. It can certainly bring valuable information as it provides a high spatial resolution while monitoring live events. Also, in this paper, a nice effort was made to make the 3D structures and conformational changes clear and understandable.

      We are grateful for the positive feedback from Reviewer 2.

      Weaknesses:

      The paper also has a number of limitations, which I detail below.

      In addition to AFM, the authors also propose a Principal Component Analysis (PCA) of exisiting structural data on actin protomers. However, this part seems very similar to another published work by others (Oda et al. JMB 2019), which is not even cited.

      We addressed this issue and explained it in Methods section, lines 612-621.

      The asymmetrical growth of cofilin clusters has so far only been seen using AFM, by the same authors (Ngo et al. eLife 2015). Using fluorescent microscopy, others have reported a very symmetrical expansion of cofilin clusters (Wioland et al. Curr Biol 2017). This is not mentioned at all, here. It should be discussed, and explanations for this discrepancy could be proposed.

      We have cited this paper (Wioland et al. Curr Biol 2017) in the current manuscript (see lines 361-362). However, we are unable to evaluate the technical distinctions between our methods and theirs. Instead, we have referred to a more recent paper that employed similar techniques to those used by Wioland et al. in Current Biology 2017. Our findings align with those reported by Bibeau JP et al. in the Journal of Molecular Biology 2021 (see their Results on page 7, titled “Cofilin clusters elongate preferentially towards the actin filament pointed end”. At the minimum, we believe this is appropriate.

      Regarding the AFM technique, I have the following concerns.

      The filaments appear densely packed on the surface, and even clearly in register in some images (if not most images, e.g., Figs 3A, 4BC, 5A). Why is that? Isn't there a risk that this could affect the result? This suggests there is some interaction between the filaments.

      In this study, as well as in many similar studies of actin filaments alone or in interaction with other actin binding proteins (ABPs) including cofilin, we have carefully considered the density of filaments when designing experiments. We used highly dense, but not packed, actin filaments to minimize free space between filaments and the surface, which helps maintain stable tip-scanning during AFM imaging. This strategy technically allows us to capture high spatial and temporal resolutions of actin filaments’ structures.

      The actin filaments, resemble paracrystal structures, are represented as densely packed actin filaments (see our data in Ngo and Kodera et al., eLife 2015, Figure 1C). Thus, the data presented in this paper is technically appropriate and does not risk misinterpretation due to lateral interactions impacting the structures and function of actin filaments and cofilin.

      The properties of the lipid layer and its interaction with the actin filaments are not clear at all. A poor control of these interactions is a problem if one aims to measure conformational changes at high resolution. The strength of the interaction appears tuned by the ratio of lipids put on the surface to change its electrostatic charge. A strong attachement likely does more than suppress torsional motion (as claimed in Fig 8A). It may also hinder cofilin binding in several ways (lower availability of binding sites on the filament facing the surface, electrostatic interactions between cofilin and the surface, etc.)

      We are confident that our lipid membrane bilayer is the optimal choice for immobilizing actin filaments in a controlled manner for HS-AFM experiments, achieved through the variation of positively charged lipids. In this study, we have fine-tuned the surface charge for our specific purposes.

      As an example, to capture high-spatial resolution images of actin structures (Figure 5-6, Figure supplement 5B, 6), we strongly fixed the filaments on DPPC/DPTAP (50/50 wt%) after the binding reaction between actin filaments and cofilin in solution was completed. This experiment yielded valuable information, including: (i) the ability to replicate the conformation of cofilactin and hybrid cofilactin/bare actin segments in solution, akin to the first steps in sample preparation for Cryo-EM techniques; and (ii) the capability to capture these structures, reflecting their solution states, by firmly fixing them on a lipid surface. On the lipid surface, these structures were retained stably during AFM imaging.

      If there is a choice, we advise against using amino-silane and other positively charged polymers typically used for modifying glass surfaces to fix actin filaments in studies using fluorescence microscopy. The strong immobilization by these chemicals can alter the structural dynamics and functions of actin filaments, lead to non-specific binding of cofilin on the modified glass surface, and potentially affect data interpretation.

      On a local scale, the reviewer may argue about the "lower availability of binding sites on the filament facing the surface". However, on a global scale, we maintain that two single strands forming helical twists of long F-actin segments should have an equal chance to bind cofilin even when fixed on a lipid membrane. The evidence shown in Figure 8A and Video 7, which demonstrates that small cofilin clusters associate and dissociate locally without developing into large clusters along the actin filament, supports our conclusion that flexibility and dynamics in helical twists plays a crucial role in facilitating the binding and growth of cofilin clusters.

      The lipid surface utilized in our study with actin filaments and cofilin provides an ideal surface, as it is flat and minimizes the nonspecific binding of cofilin to the lipid membrane (see an example of the lipid surface in Video 5).

      How do we know that the variations over time are not mostly experimental noise, i.e. variations between repeats of the same measurement? As shown in Fig 3, correlation is mostly lost from one image to the next, and rather stable after that.

      This question is similar to the above question of Reviewer 1. Please also refer to our response in lines 264-270; 333-337; 555-560, measurement Methods, and Figure supplement 3 and Table supplement 5.

      The identification of cofilactin regions relies on the additional height of the "peaks", due to the presence of cofilin. It thus seems that cofilin is detected every half helical pitch (HHP), but not in between, thereby setting the resolution for the localization of cluster borders to one HHP. It thus seems difficult to claim that there is a change in helicity without cofilin decoration over this distance. In Fig 7, the change in helicity could be due to cofilin decoration that is undetected because cofilins have not yet reached the next peak.

      There are several important criteria to distinguish the "supertwisted half helix" in cofilactin region from the "normal half helix". As illustrated in the pseudo AFM images constructed for normal F-actin and C-actin segments (with and without cofilin decoration) from PDB structures, it is evident that these two structures differ significantly in length and the number of protomer pairs per HHP (see Figure Supplement 2). In both pseudo and experimental AFM images, these parameters can be easily detected by measuring the distance between two cross-over points. Furthermore, the height or thickness difference between the cofilactin and bare actin regions is approximately 10-15 Å, which is well resolved by HS-AFM due to its exceptional z-axis resolution of ~1 Å. Technically, we were able to detect these differences by creating a longitudinal section profile that covered both bare actin and cofilactin areas, as shown in Figure supplement 6.

      We experimentally reveal that a critical cofilin cluster comprising 2-4 molecules (Figures 5-6) or larger cofilin clusters (Figures 7-8, Figure Supplements 6-8) could equally supertwist a bare half helix on the PE side. The observation that a small cofilin cluster (2-4 molecules) can shorten a half helix by reducing number of protomers per HHP to 9 or 11 (4.5 or 5.5 protomer pairs), which typically requires full decoration by 9-11 cofilin molecules, strongly suggests that supertwisting or the change in helicity does not always require complete cofilin decoration. We predicted that 2-4 bare actin protomers neighboring a cofilin cluster on the PE side can adopt the C-actin-like structure. See further in lines 324-329.

      Figure 7 captures a live binding event of cofilin at low spatial resolution, yet (i) the half helical pitches and (ii) the thickness of the cofilactin and bare actin segments can still be clearly distinguished. This demonstrates that changes in helicity within the cofilactin region propagate to an unbound half helix on the PE side, rearranging the helical twist by reducing the number of actin protomers per HHP, prior to recruiting additional cofilin for binding and expanding clusters.

      Reviewer #1 (Recommendations For The Authors):

      I believe C-form and G-form are better than C-actin like structure or G-actin like structure.

      We avoid using terms like "G-form", "F-form", or "C-form", as defined by Cryo-EM (Oda et al., 2019), because they refer to specific nucleotide and cofilin-bound states in other original papers. Instead, we use “G-actin”, “F-actin”, “C-actin”, “G-actin-like”, and “C-actin-like” to emphasize "Structural Dynamics" and "Structural Polymorphism". This highlights that even F-actin structures without cofilin bound can adopt "C-actin-like" conformations with fewer OD twists, resulting in a shorter global helical pitch. ADP-bound F-actins exhibit greater variability in helical twists than ADP-Pi-bound F-actin (Figure 9), indicating that ADP-bound F-actin protomers can adopt more C-actin-like conformations than ADP-Pi-bound F-actin protomers (Figure 1, Figure supplement 1).

      Technical terms describing actin structures do not need to be the same between Cryo-EM and HS-AFM, as the two techniques are fundamentally different. Our work underscores the importance of considering “structural dynamics and heterogeneity” in different nucleotide states of filamentous actin structures, both with and without cofilin, over time.

      Figure 1A

      A very similar analysis has already been performed by Oda et al (1). The authors should describe the relationships with the previous analysis.

      We addressed this issue in Methods – Principal component analysis – in lines 612-621.

      Figure 1B, C

      A very similar analysis has already been performed by Tanaka et al. (2). The authors should describe the relationship with the previous analysis.

      We addressed this issue in Methods – Principal component analysis – in lines 612-621 and legend of Figure 1.

      Lines 397-398

      "However, we noted that in rare instances, cofilin clusters also grew on both sides in the regular bare half helices when ATP or ADP was present."

      I believe other experiments also contain ATP in the solution. I could not catch the meaning of this sentence.

      We addressed this issue in the Results section, line 412. "However, we noted that in rare instances, cofilin clusters also grew on both sides in the regular bare half helices when only ADP was present."

      Additionally, we enhanced the description in the Methods section to avoid any confusion regarding nucleotides in the buffer. Please refer to the Methods section under “HS-AFM imaging”, lines 702-738.

      Lines 427-429

      "Consequently, the proportion of naturally supertwisted half helices with HHPs shorter than 30 nm was 5.8% for F-ADP-actin but only 1.1% and 0.2% for F-ADP.Pi-actin and phalloidin-stabilized F-actin, respectively."<br /> Similar discussion was made in (3) for the actin filaments with tension. It might be comparable with the current data.

      We cited it accordingly, line 447 for Okura et al., 2023.

      Lines 553-557

      "Nonetheless, it remains plausible that the structural flexibility exhibited 553 by ADP-bound actin protomers could result in subtle variations in the conformations of the DNase binding loop (Dloop) G46-M47-G48-N49, as suggested in (Chou and Pollard, 2019). We suggest that the absence of bound Pi possibly increases the torsional flexibilities during helical twisting of ADP bound actin filaments in contrast to their ADP.Pi-bound counterparts."

      The crystal structure of the F-form (4) showed that Pi in ADP.Pi connects the two large domains of the actin molecule, stabilizing F-form. Pi release largely weakens the connection. This might be useful for the discussion.

      We incorporated this point with the suggested citation in lines 582-584.

      (1) T. Oda et al., Structural Polymorphism of Actin. Journal of molecular biology 431, 3217-3228 (2019).

      (2) K. Tanaka et al., Structural basis for cofilin binding and actin filament disassembly. Nature communications 9, 1860 (2018).

      (3) K. Okura et al., Mechanical Stress Decreases the Amplitude of Twisting and Bending Fluctuations of Actin Filaments. Journal of molecular biology 435, 168295 (2023).

      (4) Y. Kanematsu et al., Structures and mechanisms of actin ATP hydrolysis. Proceedings of the National Academy of Sciences of the United States of America 119, e2122641119 (2022).

      Reviewer #2 (Recommendations For The Authors):

      Line 190: "Noticeably, PCA analysis revealed higher structural flexibility in F-ADP-actin (red dots), exploring a larger space than F-ADP-Pi-actin structures (orange dots) within the F-actin cluster (inset in Figure 1A)". Is there a quantification to support this claim? Visually, things are not so clear.

      We have improved Figure 1 by adding 2 circles to an inset, providing clearer quantification to support our claim.

      In the PCA part: isn't it a bit obvious, or at least expected, that the conformation adopted by actin in the cofilactin structure is the most favorable one for binding cofilin?

      We agree this point with the reviewer and have added this point accordingly in the Results section, lines 202-204.

      I found it a bit unclear how the structures in Fig 2 were obtained.

      We further explained it by adding “Zoom-in views of these long filaments are shown in Figure 2” in Methods section, line 661.

      In the AFM images, the authors always seem to know the polarity of the filaments. Unless I missed it, how they know this is not explained. In their earlier work (Ngo et al. 2015) they used a subfragment of myosin II which indicates polarity when bound to F-actin. I found no such explanation here.

      We have addressed this issue in the legend of each figure accordingly.

      For clarity, I suggest writing "C-actin-like structures" (with two hyphens) rather than "C-actin like structures".

      We agree and are currently incorporating this change in the text.

      The term "cluster" in PCA can be confusing because it is used for cofilin clusters throughout the text.

      "Cluster" is a common term used in PCA analysis. To clarify, we revised the legend in Figure 1 and Figure Supplement 1, changing "PCA clusters" to distinguish them from “cofilin clusters” or “F-actin clusters”.

      There are many acronyms. Readibility of the figure legends (which can be consulted independently from the main text) would be improved if acronyms were explicited there as well.

      We have revised some of the acronyms in the legend of each figure accordingly. At the minimum, we believe it is appropriate.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Key shortcomings include the unusual normalization strategies used for many experiments and the lack of quantification/statistical analyses for several experiments.

      In the updated version of the paper, we have addressed all of this reviewer's criticisms. Most importantly, we have performed several additional experiments to address the concern that unusual normalization strategies were used in our paper and that quantification and statistical analyses were lacking for several experiments. We have now analyzed the full set of release conditions for Shh and engineered proteins from Disp-expressing n.t. control cells and Disp-/- cells both in the presence and absence of Scube2 (Figure 1A'-D', Figure 2E added to the paper, Figure 3B'-D', Figure 5C and Figure S2F-H). Previously, we had only quantified protein release from n.t. controls and Disp-/- cells in the presence but not in the absence of Scube2 under serum-depleted conditions. Quantifications of serum-free protein release and Shh release under conditions ranging from 0.05% FCS to 10% FCS were completely missing from the earlier versions of the manuscript, but have now been added to our paper. In addition, we have reanalyzed all of the data sets in the above figures, as well as Figures 2C and S1B, to address the issue of "unusual normalization strategies": unlike previous assays in which the highest amount of protein detected in the media was set to 100% and all other proteins in that experiment were expressed relative to that value, we now directly compare the relative amounts of cellular and corresponding solubilized proteins as a method to quantify release without the need for data normalization (Figs. 1A'-D', 2C,E, 3B'-D', E, 5C, Fig. S1B, S2F-H).

      We have also repeated the qPCR analyses in C3H10T1/2 cells and now show that the same Shh/C25AShh activities can be observed when using another Shh responsive cell line, NIH3T3 cells (Fig. 4B, 6B, fig. S5B).

      We would like to point out that if the criticism refers to the presentation of our RP-HPLC and SEC data, the normalization of the strongest eluted protein signal to 100% for all proteins tested is necessary to put their behavior in a clearer relationship. This is because only the relative positions of protein elution, and not their amounts, are important in these experiments.

      The significance of the data provided is overstated because many of the presented experiments confirm/support previously published work.

      To mitigate the first reviewer's comment that the significance of the data presented is overstated, we now clearly distinguish between our novel results and the known aspect of Hh release on lipoproteins throughout our paper. We now clearly describe what is new and important in our paper: First, contrary to the general perception in the field, Disp and Scube2 are not sufficient to solubilize Shh, casting doubt on the currently accepted model that Scube2 accepts dual-lipidated Shh from Disp and transports it to the receptor Ptch. Second, lipoproteins shift dual Shh processing to N-terminal peptide processing only to generate different soluble Hh forms with different activities (as shown in Figure 4C). Third, and again contrary to popular belief, this new release mode does not inactivate Shh, as we now show in two established cellular assays for Hh biofunction (Figures 4A-C, 5B'', 6B and S5C-G). Fourth, and most importantly, we show that spatiotemporally controlled, Disp-, Scube2- and HDL-mediated Shh release absolutely requires dual lipidation of the membrane-associated Shh precursor prior to its release. This finding (as shown in Figures 1 and S2) changes the interpretation of previously published in vivo data that have long been interpreted as evidence for the requirement of dual Shh lipidation for full receptor binding and activation.

      The study provides a modest advance in our understanding of the complex issue of Shh membrane extraction.

      Although we agree that our results integrate our novel observations into previously established concepts of Hh release and trafficking, we also hope that our data cast well-founded doubt on the current view that the issue of Hh release and trafficking is largely resolved by the model of Disp-mediated Shh hand-over to Scube2 and then to Ptch, which requires interactions with both Shh lipids. Our data show that this is clearly not the case in the presence of lipoproteins. Thus, the significance of our data is that models of Shh lipid-regulated signaling to Ptch obtained using the dual-lipidated Shh precursor prior to its Disp- and Scube2-mediated conversion into a delipidated or monolipidated, HDL-associated soluble ligand are likely to describe a non-physiological interaction. Instead, our work describes a highly bioactive soluble ligand with only one lipid still attached, which has not been described before in the literature. The in vivo endpoint analyses presented in Fig. S8 suggest that this new protein variant is likely to play an important role during development.

      Reviewer #2 (Public Review):

      The precise molecular identity (of the released Shh) remains to be defined.

      We would like to respond that the direct comparison of soluble proteins and their well-defined double-lipidated precursors side-by-side in the same experiment, as shown in our paper, determines all relevant molecular changes in the Shh release process. Most importantly, we show by SDS-PAGE and RP-HPLC that HDL restricts Shh processing to the N-terminus and that the absence of HDL results in double processing of Shh during its release. We also show by SEC that the C-terminus binds the protein to HDL. In addition, the fly experiments confirm the requirement for N-terminal Hh processing, but not for processing of the C-terminal peptide, and suggest that the N-terminal Cardin-Weintraub sequence replaced by the functionally blocking tag represents the physiological cleavage site.

      It would be important to demonstrate key findings in cells that secrete Shh endogenously.

      We now confirm the key findings of our study in Panc1 cells that endogenously produce and secrete Shh: As shown in Fig. S1D, we find that soluble proteins are processed but retain the C-cholesterol, which we now directly confirm by RP-HPLC (Fig. S4F-H). The in vivo analyses shown in Fig. S8 suggest that the key finding - that N-terminal but not C-terminal Hh shedding is required for release - can be supported, at least in the fly: here, Hh variants impaired in their ability to be processed N-terminally strongly repress the endogenous protein, whereas the same protein impaired in its ability to be processed C-terminally does not.

      The authors detect Shh variants that are expressed independently of Disp and Scube2 in secretion assays, but are excluded from interpretation as experimental artifacts.

      We agree with the reviewer's criticism that the amounts of Shh released independently of Disp and Scube2 in secretion assays were not quantified and analyzed statistically to justify their proposed status as not physiologically relevant. We now show that these forms are indeed secretion artifacts (Fig. 3E and Fig. S2F-H show quantification of the lower electrophoretic mobility protein fraction (i.e., the "top" band representing the double-lipidated soluble protein fraction)) because this fraction is released independently of Disp and Scube2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript builds upon the authors' previous work on the cross-talk between transcription initiation and post-transcriptional events in yeast gene expression. These prior studies identified an mRNA 'imprinting' phenomenon linked to genes activated by the Rap1 transcription factor (TF), a surprising role for the Sfp1 TF in promoting RNA polymerase II (RNAPII) backtracking, and a role for the non-essential RNAPII subunits Rpb4/7 in the regulation of mRNA decay and translation. Here the authors aimed to extend these observations to provide a more coherent picture of the role of Sfp1 in transcription initiation and subsequent steps in gene expression. They provide evidence for (1) a physical interaction between Sfp1 and Rpb4, (2) Sfp1 binding and stabilization of mRNAs derived from genes whose promoters are bound by both Rap1 and Sfp1 and (3) an effect of Sfp1 on Rpb4 binding or conformation during transcription elongation. 

      Strengths: 

      This study provides evidence that a TF (yeast Sfp1), in addition to stimulating transcription initiation, can at some target genes interact with their mRNA transcripts and promote their stability. Sfp1 thus has a positive effect on two distinct regulatory steps. Furthermore, evidence is presented indicating that strong Sfp1 mRNA association requires both Rap1 and Sfp1 promoter binding and is increased at a sequence motif near the polyA track of many target mRNAs. Finally, they provide compelling evidence that Sfp1-bound mRNAs have higher levels of RNAPII backtracking and altered Rpb4 association or conformation compared to those not bound by Sfp1. 

      Weaknesses: 

      The Sfp1-Rpb4 association is supported only by a two-hybrid assay that is poorly described and lacks an important control. Furthermore, there is no evidence that this interaction is direct, nor are the interaction domains on either protein identified (or mutated to address function). 

      Indeed, our two hybrid, immunoprecipitation and imaging results do not allow us to conclusively discern whether the interaction between Rpb4 and Sfp1 is direct or indirect. While the interaction holds significance, we consider the direct versus indirect distinction to be of secondary importance in the context of this paper. In the current text we indicated that 'our two hybrid, immunoprecipitation and imaging results do not differentiate between a direct or indirect interactions' (see page 6, sentences highlighted in blue)

      The contention that Sfp1 nuclear export to the cytoplasm is transcription-dependent is not well supported by the experiments shown, which are not properly described in the text and are not accompanied by any primary data. 

      This section has been re-written for better clarity (see page 7). We note that this assay was originally developed and published by Lee, M. S., M. Henry, and P. A. Silver in their 1996 paper in G&D and has since been reported in numerous subsequent studies. Reassuringly, our conclusion is bolstered by the observation that Sfp1 binds to Pol II transcripts co-transcriptionally, suggesting that Sfp1 is exported in the context of the mRNA.

      The presence of Sfp1 in P-bodies is of unclear relevance and the authors do not ask whether Sfp1-bound mRNAs are also present in these condensates. 

      P-bodies consist of both RNA and proteins (reviewed in doi: 10.1021/acs.biochem.7b01162). The significance of this experiment lies in its contribution to further confirming the co-localization of Sfp1 with mRNAs and Rpb4. This observation could also yield valuable insights for future investigations into the role of Sfp1.

      Further analysis of Sfp1-bound mRNAs would be of interest, particularly to address the question of whether those from ribosomal protein genes and other growth-related genes that are known to display Sfp1 binding in their promoters are regulated (either stabilized or destabilized) by Sfp1. 

      Fig. 4A, C and D show that RP mRNAs become destabilized in sfp1Δ cells.

      The authors need to discuss, and ideally address, the apparent paradox that their previous findings showed that Rap1 acts to destabilize its downstream transcripts, i.e. that it has the opposite effect of Sfp1 shown here. 

      We would like to thank Reviewer 1 for this valuable comment. In the revised paper, we delved into our hypothesis suggesting that Rap1 is likely responsible for regulating the imprinting of other proteins, that, in turn, lead to the destabilization of mRNAs, such as Rpb4. See blue paragraph in page 20.

      Finally, recent studies indicate that the drugs used here to measure mRNA stability induce a strong stress response accompanied by rapid and complex effects on transcription. Their relevance to mRNA stability in unstressed cells is questionable. 

      Half-lives were determined mainly by the GRO analysis of optimally proliferating cells. This  method does not requires any drug or stressful treatment.  The results obtained by this method were consistent with those obtained after thiolutin addition. Using both methods, we discovered that disruption of Sfp1 results in substantial mRNA destabilization. Nevertheless, in our revised manuscript, we show results obtained by subjecting cells to a temperature shift to 42°C, a natural method to inhibit transcription. This approach to determine half-lives has been previously reported in our publications, such as Lotan et al. (2005, 2007) and Goler Baron et al. (2008). This may rule out effects of the drug on half-lives. Indeed, this assay clearly determine HL under heat stress. Thus it can clearly demonstrate that, at least during heat shock, Sfp1 stabilizes mRNAs. Since the results are similar to those obtained by the GRO method at 30oC, we concluded that Sfp1 stabilizes mRNA under optimal and hot conditions.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Kelbert et al. presents results on the involvement of the yeast transcription factor Sfp1 in the stabilisation of transcripts whose synthesis it stimulates. Sfp1 is known to affect the synthesis of a number of important cellular transcripts, such as many of those that code for ribosomal proteins. The hypothesis that a transcription factor can remain bound to the nascent transcript and affect its cytoplasmic half-life is attractive, but the methods used to demonstrate the half-life effects and the association of Sfp1 with cytoplasmic transcripts remain to be fully validated, as explained in my comments on the results below: 

      Comments on methodology and results: 

      (1) A two-hybrid-based assay for protein-protein interactions identified Sfp1, a transcription factor known for its effects on ribosomal protein gene expression, as interacting with Rpb4, a subunit of RNA polymerase II. Classical two-hybrid experiments depend on the presence of the tested proteins in the nucleus of yeast cells, suggesting that the observed interaction occurs in the nucleus. Unfortunately, the two-hybrid method cannot determine whether the interaction is direct or mediated by nucleic acids. 

      Indeed, our two hybrid, immunoprecipitation and imaging results do not allow us to conclusively discern whether the interaction between Rpb4 and Sfp1 is direct or indirect. While the interaction holds significance, we consider the direct versus indirect distinction to be of secondary importance in the context of this paper. In the current text we indicated that 'our two hybrid, immunoprecipitation and imaging results do not differentiate between a direct or indirect interactions' (see page 6)

      (2) Inactivation of nup49, a component of the nuclear pore complex, resulted in the redistribution of GFP-Sfp1 into the cytoplasm at the temperature non-permissive for the nup49-313 strain, suggesting that GFP-Sfp1 is a nucleo-cytoplasmic shuttling protein. This observation confirmed the dynamic nature of the nucleo-cytoplasmic distribution of Sfp1. For example, a similar redistribution to the cytoplasm was previously reported following rapamycin treatment and under starvation (Marion et al., PNAS 2004). In conjunction with the observation of an interaction with Rpb4, the authors observed slower nuclear import kinetics for GFP-Sfp1 in the absence of Rpb4 when cells were transferred to a glucose-containing medium after a period of starvation. Since the redistribution of GFP-Sfp1 was abolished in an rpb1-1/nup49-313 double mutant, the authors concluded that Sfp1 localisation to the cytoplasm depends on transcription. The double mutant yeast cells may show a variety of non-specific effects at the restrictive temperature, and whether transcription is required for Sfp1 cytoplasmic localisation remains incompletely demonstrated. 

      We agree with Reviewer 2 that any heat inactivation of a temperature-sensitive (ts) protein can lead to non-specific effects. It is evident that nup49-313 does not prevent Sfp1 export to the cytoplasm. In the case of rpb1-1, these non-specific effects are expected due to transcriptional arrest, which can eventually result in a reduction in protein content. However, this process takes some time, while the impact on export is more rapid. It is worth noting that this assay was developed and previously published by Pam Silver (Henry and Silver G&D 1996) and has been reported in many subsequent papers. Importantly, our conclusion is supported by the observation that Sfp1 binds both nascent RNA (co-transcriptionally) and mature mRNA (cytoplasmic). These observations, along with the reduced mRNA export upon transcription blocking, are consistent with our proposal that Sfp1 is exported in association with mRNA.

      (3) Under starvation conditions, which led to the presence of Sfp1 in the cytoplasm and have previously been correlated with a decrease in the transcription of Sfp1 target genes, the authors observed that a plasmid-based expressed GFP-Sfp1 accumulated in cytoplasmic foci. These foci were also labelled by P-body markers such as Dcp2 and Lsm1. The quality of the microscopic images provided does not allow to determine whether Rpb4-RFP colocalises with GFP-Sfp1. 

      The submitted PDF figure is of low quality. We believe that high quality figure of the final submission is convincing. 

      (4) To understand to which RNA Sfp1 might bind, the authors used an N-terminally tagged fusion protein in a cross-linking and purification experiment. This method identified 264 transcripts for which the CRAC signal was considered positive and which mostly correspond to abundant mRNAs, including 74 ribosomal protein mRNAs or metabolic enzyme-abundant mRNAs such as PGK1. The authors did not provide evidence for the specificity of the observed CRAC signal, in particular, what would be the background of a similar experiment performed without UV cross-linking. In a validation experiment, the presence of several mRNAs in a purified SFP1 fraction was measured at levels that reflect the relative levels of RNA in a total RNA extract. Negative controls showing that abundant mRNAs not found in the CRAC experiment were clearly depleted from the purified fraction with Sfp1 would be crucial to assessing the specificity of the observed protein-RNA interactions. The NON-CRAC+ selected mRNAs were enriched for genes whose expression was previously shown to be upregulated upon Sfp1 overexpression (Albert et al., 2019). The presence of unspliced RPL30 pre-mRNA in the Sfp1 purification was interpreted as a sign of co-transcriptional assembly of Sfp1 into mRNA, but in the absence of valid negative controls, this hypothesis would require further experimental validation.

      We would like to thank Reviewer 2 for bringing this issue up, as it helped us to clarify it in the revised paper.

      First, we emphasized in the Discussion that many CRAC+ genes do not fall into the category of highly transcribed genes. Please see more detailed discussion below.

      Secondly, we examined various features of the 264 genes - classified as CRAC+ - to estimate their specificity and biological significance. Our various experiments revealed that the CRAC+ genes represent a distinct group with many unique features.

      The biological significance of the 264 CRAC+ mRNAs was demonstrated by various experiments; all are inconsistent with technical flaws. In fact, all the experiments and analyses that we have pursued indicate the unique nature of the CRAC+ genes. Some examples are:

      (1) Fig. 2a and B show that most reads of CRAC+ mRNA were mapped to specific location – close the pA sites.

      (2) Fig. 2C shows that most reads of CRAC+ mRNA were mapped to specific RNA motif located near the 3’ ends of the mRNAs.

      (3) Most RiBi CRAC+ promoter contain Rap1 binding sites (p= 1.9x10-22), whiles the vast majority of RiBi non-CRAC+  promoters do not. (Fig. 3C).

      (4) Fig. 4A shows that RiBi CRAC+ mRNAs become destabilized due to Sfp1 deletion, whereas RiBi non-CRAC+ mRNAs do not. Fig. 4B shows similar results due to Sfp1 depletion.

      (5) Fig. 6B shows that the impact of Sfp1 on backtracking is substantially higher for CRAC+ than for non-CRAC+ genes. This is most clearly visible in RiBi genes.

      (6) Fig. 7A shows that the Sfp1-dependent changes along the transcription units is substantially more rigorous for CRAC+ than for non-CRAC+.

      (7) In Fig. S4B, the chromatin binding profile of Sfp1 is shown to be different for CRAC+ and non-CRAC+ genes.

      Taken together, the many unique features, in fact, any feature that we examined, indicate the specificity and significance of this group, demonstrating that our CRAC results are biologically significant.

      Most importantly, these genes do not all fall into the category of highly transcribed genes.  On the contrary, as depicted in Figure 6A (green dots), it is evident that CRAC+ genes exhibit a diverse range of Rpb3 ChIP and GRO signals. Furthermore, as illustrated in Figure 7A, when comparing CRAC+ to Q1 (the most highly transcribed genes), it becomes evident that the Rpb4/Rpb3 profile of CRAC+ genes behaves differently from the Q1 group. Evidently, despite the heterogeneous transcription of CRAC+ genes (as mentioned above), the Rpb4/Rpb3 profile decreases more substantially than that of the highly transcribed genes (Q1).  Moreover, despite similar expression levels among all RiBi mRNAs, only a portion of them binds Sfp1.

      Thus, all our results indicate that CRAC+ genes represent biologically significant group, irrespective of the expression of it members. In response to this comment, we included a new paragraph discussing the validity of our conclusions. See page 18, blue paragraph.

      (5) To address the important question of whether co-transcriptional assembly of Spf1 with transcripts could alter their stability, the authors first used a reporter system in which the RPL30 transcription unit is transferred to vectors under different transcriptional contexts, as previously described by the Choder laboratory (Bregman et al. 2011). While RPL30 expressed under an ACT1 promoter was barely detectable, the highest levels of RNA were observed in the context of the native upstream RPL30 sequence when Rap1 binding sites were also present. Sfp1 showed better association with reporter mRNAs containing Rap1 binding sites in the promoter region. However, removal of the Rap1 binding sites from the reporter vector also led to a drastic decrease in reporter mRNA levels. Whether the fraction of co-purified RNA is nuclear and co-transcriptional or not cannot be inferred from these results. 

      The proposed co-transcriptional binding of Sfp1 is based on the findings presented in Figure 5C and Figure S2D, as well as the observed binding of Sfp1 to transcripts containing introns, as shown in Figures 2D and 3B.  The results of Fig. 3 led us to the assertion that the "RNA-binding capacity of Sfp1 is regulated by Rap1-binding sites located at the promoter." We maintain our stance on this conclusion. Indeed, the Rap1 binding site does impact mRNA levels, as highlighted by Reviewer 2. However, "construct E," which possesses a promoter with a Rap1 binding site, exhibits lower transcript levels compared to "construct F," which lacks such a binding site in its promoter. Despite this difference in transcript levels, Sfp1 was able to pull down the former transcript but not the latter, even though expression of the former gene is relatively low. Thus, the results appear to be more reliant on the specific capacity of Sfp1 to interact with the transcript rather than on the transcript's expression level.

      (6) To complement the biochemical data presented in the first part of the manuscript, the authors turned to the deletion or rapid depletion of SFP1 and used labelling experiments to assess changes in the rate of synthesis, abundance, and decay of mRNAs under these conditions. An important observation was that in the absence of Sfp1, mRNAs encoding ribosomal protein genes not only had a reduced synthesis rate but also an increased degradation rate. This important observation needs careful validation, as genomic run-on experiments were used to measure half-lives, and this particular method was found to give results that correlated poorly with other measures of half-life in yeast (e.g. Chappelboim et al., 2022 for a comparison). Similarly, the use of thiolutin to block transcription as a method of assessing mRNA half-life has been reported to be problematic, as thiolutin can specifically inhibit the degradation of ribosomal protein mRNA (Pelechano & Perez-Ortin, 2008). Specific repressible reporters, such as those used by Baudrimont et al. (2017), would need to be tested to validate the effect of Sfp1 on the half-life of specific mRNAs. Also, it would be very difficult to infer from the images presented whether the rate of deadenylation is altered by Sfp1.

      Various methods exist for assessing mRNA half-lives (HLs), and each of them carries its own set of challenges and biases. Consequently, it becomes problematic to directly compare HL values of a specific mRNA when different methods are employed. The superiority of one particular method over others remains unclear (in my opinion). However, they exhibit a high degree of reliability when it comes to comparing different strains under the identical conditions using a single method.

      Estimating HLs through the GRO approach is a non-invasive method, applied on optimally proliferating cells, which has been employed in numerous publications. While no method is without its limitations, our experience along the years reassured approach to be among the most dependable. Our HL determination using thiolutin to block transcription provided results that were consistent with the values obtained by the GRO approach.

      Nevertheless, in our revised manuscript, we supplemented the HL data, obtain by thiolutin, with results obtained by subjecting cells to a temperature shift to 42°C, a natural method to block transcription in wild-type (WT) cells. This approach to determine HLs has been previously reported in our publications, such as Lotan et al. (2005, 2007) and Goler Baron et al. (2008). The new results are shown in Fig. S3B. They are consistent with our conclusion that Sfp1 stabilizes mRNAs.

      Using a repressible promoter to determine mRNA HL is, unfortunately, not suitable in this paper because the promoter itself is involved in HL regulation. This observation is supported by Bregman et al. (2011) and depicted in Fig. 3, which illustrates that the promoter is critical for mRNA imprinting, consequently regulating HL.

      (7) The effects of SFP1 on transcription were investigated by chromatin purification with Rpb3, a subunit of RNA polymerase, and the results were compared with synthesis rates determined by genomic run-on experiments. The decrease in polII presence on transcripts in the absence of SFP1 was not accompanied by a marked decrease in transcript output, suggesting an effect of Sfp1 in ensuring robust transcription and avoiding RNA polymerase backtracking. To further investigate the phenotypes associated with the depletion or absence of Sfp1, the authors examined the presence of Rpb4 along transcription units compared to Rpb3. One effect of spf1 deficiency was that this ratio, which decreased from the start of transcription towards the end of transcripts, increased slightly. The results presented are largely correlative and could arise from the focus on very specific types of mRNAs, such as those of ribosomal protein genes, which are sensitive to stress and are targeted by very active RNA degradation mechanisms activated, for example, under heat stress (Bresson et al., 2020). 

      Figure 7A illustrates a significant reduction in Rpb4/Rpb3 ratios along the transcription unit in WT cells. This reduction is notably more pronounced in CRAC+ genes compared to the highly transcribed quartile (Q1), which includes all ribosomal protein (RP) genes, and it is completely absent in sfp1∆ cells. Furthermore, it's important to highlight that the CRAC+ gene group displays a wide range of transcription rates, as measured by either Rpb3 ChIP or GRO (Figure 6A). Given these observations, we do not think that heightened sensitivity of RP mRNA degradation in response to stress is responsible for the pronounced difference in the configuration of the Pol II elongation complex that is detected in CRAC+ genes, mainly because this experiment was performed under standard (non-stress) culture conditions.

      Correlative studies are particularly informative when a gene mutation eliminates a correlation, and this is precisely the type of study depicted in Figure 7B-C. The correlations shown in these panels are dependent on Sfp1. Indeed, RP genes are sensitive to stress. However, we used non-stressed conditions. Furthermore, CRAC+ genes did not display any apparent unusual destabilization but rather exhibited higher (not lower) mRNA stability compared to non-CRAC+ genes (Figure 7C).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The paper combines phenotypic and genomic analyses of the "sheltered load" (i.e. the accumulation of deleterious mutations linked to S-loci that are hidden from selection in the homozygous state) in Arabidopsis. The authors compare results to previous theoretical predictions concerning the extent of the load in dominant vs recessive S-alleles, and further develop exciting theory to reconcile differences between previous theory and observed results.

      Strengths:

      This is a very nice combination of theory and data to address a classical question in the field.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The "genetic load" is a poorly defined concept in general, and its quantification via the number of putatively deleterious mutations is quite difficult. Furthermore counting up the number of derived mutations at fully constrained nucleotides may not be a great estimate of the load, and certainly does not allow for evaluation of recessivity -- a concept critical to ideas concerning the sheltered load. Alternative approaches - including estimating the severity of mutations - could be helpful as well. This imperfection in available approaches to test theory must be acknowledged more strongly by the authors.

      As suggested by the reviewer, we implemented alternative approaches to estimate the severity of deleterious mutations and now report the results of SNPeff and

      SIFT4G analyses in Table S6. The results we obtained with these other metrics were overall very similar to those based on our previous counting of mutations at 0-fold and 4-fold degenerate sites. More generally, we tried to improve the presentation of our strategy to estimate the genetic load (clarified in lines 262-268, 271, 292-295, 297. In particular, we made it clear that our population genetic analysis cannot assess the recessivity of the observed mutations (lines 428-434).

      Reviewer #2 (Public Review):

      Summary:

      This study looks into the complex dominance patterns of S-allele incompatibilities in Brassicaceae, through which it attempts to learn more about the sheltering of deleterious load. I found several weak points in the analyses that diminished my excitement about the results. In particular, the way in which deleterious mutations were classified lacked the ability to distinguish the severity of the mutations and thus their expected associated dominance.

      First, we would like to clarify that our goal with this study is NOT to learn something about dominance of the linked deleterious mutations (we can not). Instead, we compare the accumulation of deleterious mutations linked to dominant vs recessive S-ALLELES, but are agnostic regarding the dominance level of the LINKED mutations themselves. The rationale is that the different intensities of natural selection between dominant vs recessive S-alleles provide a powerful way to examine the process by which deleterious mutations are sheltered in general. We further clarified this aspect on lines 70-73 and 399-401.

      Second, as mentioned above in response to Reviewer 1, we complemented the analysis by predicting the severity of the deleterious mutations by SIFT4G and SNPeff. The results were largely consistent, with the exception that the number of sites included in SIFT4G was low, such that the statistical power was reduced (lines 296-300).

      Furthermore, the simulation approach could have provided this exact sort of insight but was not designed to do so, making this comparison to the empirical data also less than exciting for me.

      As explained above, studying dominance of the linked mutations we observed is an interesting research question (albeit a difficult one), but it was not our goal here. Instead, our study was designed as an empirical test of the predictions presented in Llaurens et al (2009), and we re-analysed some aspects of the model outcome to illustrate our points.

      We now better explain that we based our choice of parameters on the fact that in the theoretical study by Llaurens et al (2009), recessive deleterious mutations are predicted to accumulate in a much more straightforward manner (line 316-318).

      We now dedicate a paragraph of the discussion to explain how our stochastic simulations could be improved, and acknowledge that a full exploration of the interaction between dominance of the S-alleles and dominance of the linked deleterious mutations would be an interesting follow-up - albeit beyond the scope of our study (line 437-441).

      Major and minor comments:

      I think the introduction (or somewhere before we dive into it in the results) of the dominance hierarchy for the S-alleles needs a more in-depth explanation. Not being familiar with this beforehand really made this paper inaccessible to me until I then went to find out more before continuing. I would expect this paper to be broad enough that self-contained information makes it accessible to all readers. For example, lines 110-115 could be in the Introduction.

      We thank the reviewer for this useful remark. We now give a more comprehensive description of the dominance hierarchy and introduce the classes of dominance in A. lyrata already in the introduction, on lines 64-70.

      Along with my above comment, perhaps it is not my place to comment, but I find the paper not of a broad enough scope to be of interest to a broad readership. This S-allele dominance system is more than simple balancing selection, it is a very complex and specific form of dominance between several haplotypes, and the mechanism of dominance does not seem to be genetic. I am not sure that it thus extrapolates to broad comments on general dominance and balancing selection, e.g. it would not be the same as considering inversions and this form of balancing selection where we also expect recessive deleterious mutations to accumulate.

      We disagree with these interpretations by the reviewer, for two reasons:

      First, the mechanism of dominance is actually entirely genetic. In fact, we uncovered some years ago that it is based on the molecular interaction between small non-coding RNAs from dominant alleles and their target sites on recessive alleles (Durand et al. Science 2014, see lines 68-70). If there is something specific with this system, it is that the dominance phenomenon is better understood at the mechanistic level than in most other cases, but the resulting phenomenon in itself (a dominance hierarchy) is rather common.

      Second, the kind of variation in the intensity of linked selection created by this mechanism is actually a general phenomenon, so our results have broad relevance beyond our particular study system. We modified the introduction to explain this point

      more clearly, highlighting in particular the fact that the situation we study closely resembles the case of sex chromosomes, where X (or Z) chromosomes are genetically recessive and Y (or W) chromosomes are genetically dominant. We cite this example in lines 83-87 of the introduction and also several well-studied other examples on lines 480-489 of the discussion.

      It would have been particularly interesting, or a nice addition, to see deleterious mutations classed by something like SNPeff or GERP where you can have different classes of moderate to severe deleterious variants, which we would expect also to be more recessive the more deleterious they are. In line with my next comment on the simulations, I think relative differences between mutations expected to be more or less dominant may be even more insightful into the process of sheltering which may or may not be going on here.

      We agree with the reviewer, and as detailed above we have now integrated such analyses with SNPeff and SIFT4G (Table S6). These new results reinforce our conclusion that while S-allele dominance influences the fixation of deleterious mutations, it has no effect on their total number. See lines 270-272 and 296-300.

      In the simulations, h=0 and s=0.01 (as in Figure 5) for all deleterious mutations seems overly simplistic, and at the convenient end for realistic dominance. I think besides recessive lethals which we expect to be close to h=0 would have a much larger selection coefficient, and other deleterious mutations would only be partially recessive at such an s value. I expect this would change some of the simulation results seen, though to what degree I am not certain. It would be nice to at least check the same exact results for h=0.3 or 0.2 (or additionally also for recessive lethals, e.g. h=0 and s=-0.9). I would also disagree with the statement in line 677, many studies have shown, particularly those on balancing selection, that partially recessive deleterious mutations are not eliminated by natural selection and do play a role in population genetic dynamics. I am also not surprised that extinction was found for higher s values when the mutation rate for such mutations was very high and the distribution of s values was constant. An influx of such highly deleterious mutations is unlikely to ever let a population survive, yet that does NOT mean that in nature, the rare influx of such mutations does lead to them being sheltered. I find overall that the simulation results contribute very little, to none, to this paper, as without something more realistic, like a simultaneous distribution of s and h values, you cannot say which, if any class of these mutations are the ones expected to accumulate because of S-allele dominance.

      We understand that the previous version of our manuscript was confusing between dominance of the S-alleles and dominance of the linked deleterious mutations. We clarified that our study focuses on the effect of the former only (lines 99, 263-264 and 581-583).

      We agree that a complete exploration of the interaction between dominance of the S-alleles and dominance of the linked mutations being sheltered would have been an asset, but as explained above this is not the focus of our study. The previous work by Llaurens et al (2009) has already established that deleterious mutations can fix within S-allele lineages, especially when linked to dominant S-alleles, and when the number of S-alleles is large. Under the conditions they examined, deleterious mutations were much more strongly eliminated if not fully recessive (h=0 vs h=0.2), so for the present study we decided to simulate fully recessive mutations only. We now formally acknowledge the possibility that some complex interaction may take place between dominance of the S-alleles and dominance of the linked deleterious mutations (lines 440-442). However, as explained above we feel that fully exploring this complex interaction would require a detailed investigation, which is clearly beyond the scope of the present study.

      Rather they only show the disappointing or less exciting result that fully recessive, weakly deleterious mutations (which I again think do not even exist in nature as I said above) have minor, to no effect across the classes of S-allele dominance. They provide no insight into whether any type of recessive deleterious mutation can accumulate under the S-allele dominance hierarchy, and that is the interesting question at hand. I would either remove these simulations or redo them in another approach. The authors never mention what simulation approach was used, so I can only assume this is custom, in-house code. Yet I do not find that code provided on the github page. I do not know if the lack of a distribution for h and s values is then a choice or a programming limitation, but I see it as one that should be overcome if these simulations are meant to be meaningful to the results of the study.

      The code we used (in C) was adapted from the previous study by Llaurens et al. (2009), which at the time was not deposited in a data repertory, unfortunately. With the agreement of the authors of that study, this code is now available on Github:

      (https://github.com/leveveaudrey/model_ssi_Llaurens; line 723).

      It is correct that our simulations were not aimed at determining whether “any type of recessive deleterious mutation can accumulate”, but we strongly believe that they help interpreting the observations made in the genomic data.

      Recommendations for the authors:

      Notes from the editor:

      I found Table 1 confusing, with column headings of observed proportion but perhaps numbers reflecting counts.

      Thank you for pointing out this confusion. There was indeed an error in the last column, which we have now corrected.

      I found Figure 2 a bit hard to parse, with the vertical lines being unclear and the x-axis ticks of insufficient resolution to evaluate the physical extent of the signals.

      We increased the size of the label on the x-axis and detailed it on the Figure 2, which is now hopefully more clear. Moreover, we increase the size of the vertical lines.

      Finally, I wonder, given the rapid decay of signal in lyrata, whether 25kb is the right choice for evaluating load and whether the pattern may look different on a smaller scale.

      It is true that the signal decays rapidly in A. lyrata, as can be seen in the haplotype structure analysis and in line with our previous analysis of the same populations Le Veve et al (MBE 2023; in this study we explored the effect of the choice of the size of the chromosomal region analyzed; lines 266-269). However, for the sake of comparison, we prefer to stick to the same window size. The fact that we still see an effect of dominance in spite of the lower statistical power associated with the more rapid decay (because a smaller number of genes is expected to be impacted) actually reinforces our conclusions.

      Reviewer #1 (Recommendations For The Authors):

      I have a few additional suggestions to improve the manuscript.

      (1) How does the load linked to the S-locus compare to that observed in other genomic regions? It would be useful to provide a comparison of the results quantified in Figures three and four to comparable genomic regions unlinked to the S-locus. How severe is the linked load?

      This comparison to the genomic background was actually the core of our previous study (Le Veve et al MBE 2023), which was based on the same populations. This analysis revealed that polymorphism of the 0-fold degenerate sites was more than twice higher in the 25kb immediately flanking the S-locus than in a series of 100 unlinked control regions. Here, the main focus of the present study is on the effect of linkage to particular S-alleles (which was not possible previously because haplotypes had to be phased).

      (2) Details of the GLM for data underlying Figures 3 and 4 are somewhat unclear. Is the key explanatory variable (Dominance) treated as continuous? Categorical? Ordinal etc…

      Dominance is considered as a continuous variable. We specify this in line 162 of the results, in the legends of Figures 3 and 4, in the Material and Method (lines 627 and 660) and in the legend of Table S4.

      (3) I had some trouble understanding the two different p-values in columns five and six of table one. Please provide more detail.

      We understand that the two p-values in Table 1 were confusing. The first was related to the binomial test and the second to the permutation test. To be consistent with the rest of the manuscript, we conserved only the p-value of the permutation test.

      (4) As mentioned in the "weaknesses" above, the authors should be more clear about what they are quantifying. They are explicitly counting the number of variants at 0-fold degenerate sites as a proxy for the genetic load. How good this proxy is is unclear. The most egregious misstatement here was on line 314 in which they make reference to the "total load." However, this limitation should be acknowledged throughout the manuscript and deserves more attention in the methods and discussion.

      As mentioned above, we now integrate additional methods to define and quantify the load (SIFT4G and SNPeff), which reinforced our previous conclusions (lines 271-272, 297-302).

      We clarified our wording and replaced the mention of “total load” by “mean number of linked deleterious mutations per copy of S-allele” (line 324-325). In the discussion we tried to better explain the limitations of approaches to estimate the genetic load (line 431-437).

      Reviewer #2 (Recommendations For The Authors):

      Line 60, it should be specified that this is only for recessive deleterious mutations.

      Non-recessive deleterious mutations would certainly not be expected to accumulate.

      As explained in details above, the question of whether and how non-recessive deleterious mutations can accumulate when linked to the S-locus is difficult and would in itself deserve a full treatment, which is clearly beyond the scope of the present study. We clarified this point on line 56.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major comments (Public Reviews)

      Generality of grid cells

      We appreciate the reviewers’ concern regarding the generality of our approach, and in particular for analogies in nonlinear spaces. In that regard, there are at least two potential directions that could be pursued. One is to directly encode nonlinear structures (such as trees, rings, etc.) with grid cells, to which DPP-A could be applied as described in our model. The TEM model [1] suggests that grid cells in the medial entorhinal may form a basis set that captures structural knowledge for such nonlinear spaces, such as social hierarchies and transitive inference when formalized as a connected graph. Another would be to use eigen-decomposition of the successor representation [2], a learnable predictive representation of possible future states that has been shown by Stachenfield et al. [3] to provide an abstract structured representation of a space that is analogous to the grid cell code. This general-purpose mechanism could be applied to represent analogies in nonlinear spaces [4], for which there may not be a clear factorization in terms of grid cells (i.e., distinct frequencies and multiple phases within each frequency). Since the DPP-A mechanism, as we have described it, requires representations to be factored in this way it would need to be modified for such purpose. Either of these approaches, if successful, would allow our model to be extended to domains containing nonlinear forms of structure. To the extent that different coding schemes (i.e., basis sets) are needed for different forms of structure, the question of how these are identified and engaged for use in a given setting is clearly an important one, that is not addressed by the current work. We imagine that this is likely subserved by monitoring and selection mechanisms proposed to underlie the capacity for selective attention and cognitive control [5], though the specific computational mechanisms that underlie this function remain an important direction for future research. We have added a discussion of these issues in Section 6 of the updated manuscript.

      (1) Whittington, J.C., Muller, T.H., Mark, S., Chen, G., Barry, C., Burgess, N. and Behrens, T.E., 2020. The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell, 183(5), pp.1249-1263.

      (2) Dayan, P., 1993. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4), pp.613-624.

      (3) Stachenfeld, K.L., Botvinick, M.M. and Gershman, S.J., 2017. The hippocampus as a predictive map. Nature neuroscience, 20(11), pp.1643-1653.

      (4) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      (5) Shenhav, A., Botvinick, M.M. and Cohen, J.D., 2013. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 79(2), pp.217-240. Biological plausibility of DPP-A

      We appreciate the reviewers’ interest in the biological plausibility of our model, and in particular the question of whether and how DPP-A might be implemented in a neural network. In that regard, Bozkurt et al. [1] recently proposed a biologically plausible neural network algorithm using a weighted similarity matrix approach to implement a determinant maximization criterion, which is the core idea underlying the objective function we use for DPP-A, suggesting that the DPP-A mechanism we describe may also be biologically plausible. This could be tested experimentally by exposing individuals (e.g., rodents or humans) to a task that requires consistent exposure to a subregion, and evaluating the distribution of activity over the grid cells. Our model predicts that high frequency grid cells should increase their firing rate more than low frequency cells, since the high frequency grid cells maximize the determinant of the covariance matrix of the grid cell embeddings. It is also worth noting that Frankland et al. [2] have suggested that the use of DPPs may also help explain a mutual exclusivity bias observed in human word learning and reasoning. While this is not direct evidence of biological plausibility, it is consistent with the idea that the human brain selects representations for processing that maximize the volume of the representational space, which can be achieved by maximizing the DPP-A objective function defined in Equation 6. We have added a comment to this effect in Section 6 of the updated manuscript.

      (1) Bozkurt, B., Pehlevan, C. and Erdogan, A., 2022. Biologically-plausible determinant maximization neural networks for blind separation of correlated sources. Advances in Neural Information Processing Systems, 35, pp.13704-13717.

      (2) Frankland, S. and Cohen, J., 2020. Determinantal Point Processes for Memory and Structured Inference. In CogSci.

      Simplicity of analogical problem and comparison to other models using this task

      First, we would like to point out that analogical reasoning is a signatory feature of human cognition, which supports flexible and efficient adaptation to novel inputs that remains a challenge for most current neural network architectures. While humans can exhibit complex and sophisticated forms of analogical reasoning [1, 2, 3], here we focused on a relatively simple form, that was inspired by Rumelhart’s parallelogram model of analogy [4,5] that has been used to explain traditional human verbal analogies (e.g., “king is to what as man is to woman?”). Our model, like that one, seeks to explain analogical reasoning in terms of the computation of simple Euclidean distances (i.e., A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript. It is worth noting that, despite the seeming simplicity of this construction, we show that standard neural network architectures (e.g., LSTMs and transformers) struggle to generalize on such tasks without the use of the DPP-A mechanism.

      Second, we are not aware of any previous work other than Frankland et al. [6] cited in the first paragraph of Section 2.2.1, that has examined the capacity of neural network architectures to perform even this simple form of analogy. The models in that study were hardcoded to perform analogical reasoning, whereas we trained models to learn to perform analogies. That said, clearly a useful line of future work would be to scale our model further to deal with more complex forms of representation and analogical reasoning tasks [1,2,3]. We have noted this in Section 6 of the updated manuscript.

      (1) Holyoak, K.J., 2012. Analogy and relational reasoning. The Oxford handbook of thinking and reasoning, pp.234-259.

      (2) Webb, T., Fu, S., Bihl, T., Holyoak, K.J. and Lu, H., 2023. Zero-shot visual reasoning through probabilistic analogical mapping. Nature Communications, 14(1), p.5144.

      (3) Lu, H., Ichien, N. and Holyoak, K.J., 2022. Probabilistic analogical mapping with semantic relation networks. Psychological review.

      (4) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (5) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (6) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      Clarification of DPP-A attentional modulation

      We would like to clarify several concerns regarding the DPP-A attentional modulation. First, we would like to make it clear that ω is not meant to correspond to synaptic weights, and thank the reviewer for noting the possibility for confusion on this point. It is also distinct from a biasing input, which is often added to the product of the input features and weights. Rather, in our model ω is a vector, and diag (ω) converts it into a matrix with ω as the diagonal of the matrix, and the rest entries are zero. In Equation 6, diag(ω) is matrix multiplied with the covariance matrix V, which results in elementwise multiplication of ω with column vectors of V, and hence acts more like gates. We have noted this in Section 2.2.2 and have changed all instances of “weights (ω)” to “gates (ɡ)” in the updated manuscript. We have also rewritten the definition of Equation 6 and uses of it (as in Algorithm 1) to depict the use of sigmoid nonlinearity (σ) to , so that the resulting values are always between 0 and 1.

      Second, we would like to clarify that we don’t compute the inner product between the gates ɡ and the grid cell embeddings x anywhere in our model. The gates within each frequency were optimized (independent of the task inputs), according to Equation 6, to compute the approximate maximum log determinant of the covariance matrix over the grid cell embeddings individually for each frequency. We then used the grid cell embeddings belonging to the frequency that had the maximum within-frequency log determinant for training the inference module, which always happened to be grid cells within the top three frequencies. Author response image 1 (also added to the Appendix, Section 7.10 of the updated manuscript) shows the approximate maximum log determinant (on the y-axis) for the different frequencies (on the x-axis).

      Author response image 1.

      Approximate maximum log determinant of the covariance matrix over the grid cell embeddings (y-axis) for each frequency (x-axis), obtained after maximizing Equation 6.

      Third, we would like to clarify our interpretation of why DPP-A identified grid cell embeddings corresponding to the highest spatial frequencies, and why this produced the best OOD generalization (i.e., extrapolation on our analogy tasks). It is because those grid cell embeddings exhibited greater variance over the training data than the lower frequency embeddings, while at the same time the correlations among those grid cell embeddings were lower than the correlations among the lower frequency grid cell embeddings. The determinant of the covariance matrix of the grid cell embeddings is maximized when the variances of the grid cell embeddings are high (they are “expressive”) and the correlation among the grid cell embeddings is low (they “cover the representational space”). As a result, the higher frequency grid cell embeddings more efficiently covered the representational space of the training data, allowing them to efficiently capture the same relational structure across training and test distributions which is required for OOD generalization. We have added some clarification to the second paragraph of Section 2.2.2 in the updated manuscript. Furthermore, to illustrate this graphically, Author response image 2 (added to the Appendix, Section 7.10 of the updated manuscript) shows the results after the summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for 3 representative frequencies (left, middle and right panels showing results for the lowest, middle and highest grid cell frequencies, respectively, of the 9 used in the model), obtained after maximizing Equation 6 for each grid cell frequency. The color code indicates the responsiveness of the grid cells to different X and Y locations in the input space (lighter color corresponding to greater responsiveness). Note that the dark blue area (denoting regions of least responsiveness to any grid cell) is greatest for the lowest frequency and nearly zero for the highest frequency, illustrating that grid cell embeddings belonging to the highest frequency more efficiently cover the representational space which allows them to capture the same relational structure across training and test distributions as required for OOD generalization.

      Author response image 2.

      Each panel shows the results after summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for a particular frequency, obtained after maximizing Equation 6 for each grid cell frequency. The left, middle, and right panels show results for the lowest, middle, and highest grid cell frequencies, respectively, of the 9 used in the model. Lighter color in each panel corresponds to greater responsiveness of grid cells at that particular location in the 2d space.

      Finally, we would like to clarify how the DPP-A attentional mechanism is different from the attentional mechanism in the transformer module, and why both are needed for strong OOD generalization. Use of the standard self-attention mechanism in transformers over the inputs (i.e., A, B, C, and D for the analogy task) in place of DPP-A would lead to weightings of grid cell embeddings over all frequencies and phases. The objective function for the DPP-A represents an inductive bias, that selectively assigns the greatest weight to all grid cell embeddings (i.e., for all phases) of the frequency for which the determinant of the covariance matrix is greatest computed over the training space. The transformer inference module then attends over the inputs with the selected grid cell embeddings based on the DPP-A objective. We have added a discussion of this point in Section 6 of the updated manuscript.

      We would like to thank the reviewers for their recommendations. We have tried our best to incorporate them into our updated manuscript. Below we provide a detailed response to each of the recommendations grouped for each reviewer.

      Reviewer #1 (Recommendations for the authors)

      (1) It would be helpful to see some equations for R in the main text.

      We thank the reviewer for this suggestion. We have now added some equations explaining the working of R in Section 2.2.3 of the updated manuscript.

      (2) Typo: p 11 'alongwith' -> 'along with'

      We have changed all instances of ‘alongwith’ to ‘along with’ in the updated manuscript.

      (3) Presumably, this is related to equivariant ML - it would be helpful to comment on this.

      Yes, this is related to equivariant ML, since the properties of equivariance hold for our model. Specifically, the probability distribution after applying softmax remains the same when the transformation (translation or scaling) is applied to the scores for each of the answer choices obtained from the output of the inference module, and when the same transformation is applied to the stimuli for the task and all the answer choices before presenting as input to the inference module to obtain the scores. We have commented on this in Section 2.2.3 of the updated manuscript.

      Reviewer #2 (Recommendations for the authors)

      (1) Page 2 - "Webb et al." temporal context - they should also cite and compare this to work by Marc Howard on generalization based on multi-scale temporal context.

      While we appreciate the important contributions that have been made by Marc Howard and his colleagues to temporal coding and its role in episodic memory and hippocampal function, we would like to clarify that his temporal context model is unrelated to the temporal context normalization developed by Webb et al. (2020) and mentioned on Page 2. The former (Temporal Context Model) is a computational model that proposes a role for temporal coding in the functions of the medial temporal lobe in support of episodic recall, and spatial navigation. The latter (temporal context normalization) is a normalization procedure proposed for use in training a neural network, similar to batch normalization [1], in which tensor normalization is applied over the temporal instead of the batch dimension, which is shown to help with OOD generalization. We apologize for any confusion engendered by the similarity of these terms, and failure to clarify the difference between these, that we have now attempted to do in a footnote on Page 2.

      Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

      (2) page 3 - "known to be implemented in entorhinal" - It's odd that they seem to avoid citing the actual biology papers on grid cells. They should cite more of the grid cell recording papers when they mention the entorhinal cortex (i.e. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Giocomo et al., 2011; Brandon et al., 2011).

      We have now cited the references mentioned below, on page 3 after the phrase “known to be implemented in entohinal cortex”.

      (1) Barry, C., Hayman, R., Burgess, N. and Jeffery, K.J., 2007. Experience-dependent rescaling of entorhinal grids. Nature neuroscience, 10(6), pp.682-684.

      (2) Stensola, H., Stensola, T., Solstad, T., Frøland, K., Moser, M.B. and Moser, E.I., 2012. The entorhinal grid map is discretized. Nature, 492(7427), pp.72-78.

      (3) Giocomo, L.M., Hussaini, S.A., Zheng, F., Kandel, E.R., Moser, M.B. and Moser, E.I., 2011. Grid cells use HCN1 channels for spatial scaling. Cell, 147(5), pp.1159-1170.

      (4) Brandon, M.P., Bogaard, A.R., Libby, C.P., Connerney, M.A., Gupta, K. and Hasselmo, M.E., 2011. Reduction of theta rhythm dissociates grid cell spatial periodicity from directional tuning. Science, 332(6029), pp.595-599.

      (3) To enhance the connection to biological systems, they should cite more of the experimental and modeling work on grid cell coding (for example on page 2 where they mention relational coding by grid cells). Currently, they tend to cite studies of grid cell relational representations that are very indirect in their relationship to grid cell recordings (i.e. indirect fMRI measures by Constaninescu et al., 2016 or the very abstract models by Whittington et al., 2020). They should cite more papers on actual neurophysiological recordings of grid cells that suggest relational/metric representations, and they should cite more of the previous modeling papers that have addressed relational representations. This could include work on using grid cell relational coding to guide spatial behavior (e.g. Erdem and Hasselmo, 2014; Bush, Barry, Manson, Burges, 2015). This could also include other papers on the grid cell code beyond the paper by Wei et al., 2015 - they could also cite work on the efficiency of coding by Sreenivasan and Fiete and by Mathis, Herz, and Stemmler.

      We thank the reviewer for bringing the additional references to our attention. We have cited the references mentioned below on page 2 of the updated manuscript.

      (1) Erdem, U.M. and Hasselmo, M.E., 2014. A biologically inspired hierarchical goal directed navigation model. Journal of Physiology-Paris, 108(1), pp.28-37.

      (2) Sreenivasan, S. and Fiete, I., 2011. Grid cells generate an analog error-correcting code for singularly precise neural computation. Nature neuroscience, 14(10), pp.1330-1337.

      (3) Mathis, A., Herz, A.V. and Stemmler, M., 2012. Optimal population codes for space: grid cells outperform place cells. Neural computation, 24(9), pp.2280-2317.

      (4) Bush, D., Barry, C., Manson, D. and Burgess, N., 2015. Using grid cells for navigation. Neuron, 87(3), pp.507-520

      (4) Page 3 - "Determinantal Point Processes (DPPs)" - it is rather annoying that DPP is defined after DPP-A is defined. There ought to be a spot where the definition of DPP-A is clearly stated in a single location.

      We agree it makes more sense to define Determinantal Point Process (DPP) before DPP-A. We have now rephrased the sentences accordingly. In the “Abstract”, the sentence now reads “Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), which we call DPP attention (DPP-A) - a transformation that ensures maximum sparseness in the coverage of that space.” We have also modified the second paragraph of the “Introduction”. The modified portion now reads “b) an attentional objective inspired from Determinantal Point Processes (DPPs), which are probabilistic models of repulsion arising in quantum physics [1], to attend to abstract representations that have maximum variance and minimum correlation among them, over the training data. We refer to this as DPP attention or DPP-A.” Due to this change, we removed the last sentence of the fifth paragraph of the “Introduction”.

      (1) Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

      (5) Page 3 - "the inference module R" - there should be some discussion about how this component using LSTM or transformers could relate to the function of actual brain regions interacting with entorhinal cortex. Or if there is no biological connection, they should state that this is not seen as a biological model and that only the grid cell code is considered biological.

      While we agree that the model is not construed to be as specific about the implementation of the R module, we assume that — as a standard deep learning component — it is likely to map onto neocortical structures that interact with the entorhinal cortex and, in particular, regions of the prefrontal-posterior parietal network widely believed to be involved in abstract relational processes [1,2,3,4]. In particular, the role of the prefrontal cortex in the encoding and active maintenance of abstract information needed for task performance (such as rules and relations) has often been modeled using gated recurrent networks, such as LSTMs [5,6], and the posterior parietal cortex has long been known to support “maps” that may provide an important substrate for computing complex relations [4]. We have added some discussion about this in Section 2.2.3 of the updated manuscript.

      (1) Waltz, J.A., Knowlton, B.J., Holyoak, K.J., Boone, K.B., Mishkin, F.S., de Menezes Santos, M., Thomas, C.R. and Miller, B.L., 1999. A system for relational reasoning in human prefrontal cortex. Psychological science, 10(2), pp.119-125.

      (2) Christoff, K., Prabhakaran, V., Dorfman, J., Zhao, Z., Kroger, J.K., Holyoak, K.J. and Gabrieli, J.D., 2001. Rostrolateral prefrontal cortex involvement in relational integration during reasoning. Neuroimage, 14(5), pp.1136-1149.

      (3) Knowlton, B.J., Morrison, R.G., Hummel, J.E. and Holyoak, K.J., 2012. A neurocomputational system for relational reasoning. Trends in cognitive sciences, 16(7), pp.373-381.

      (4) Summerfield, C., Luyckx, F. and Sheahan, H., 2020. Structure learning and the posterior parietal cortex. Progress in neurobiology, 184, p.101717.

      (5) Frank, M.J., Loughry, B. and O’Reilly, R.C., 2001. Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cognitive, Affective, & Behavioral Neuroscience, 1, pp.137-160.

      (6) Braver, T.S. and Cohen, J.D., 2000. On the control of control: The role of dopamine in regulating prefrontal function and working memory. Control of cognitive processes: Attention and performance XVIII, (2000).

      (6) Page 4 - "Learned weighting w" - it is somewhat confusing to use "w" as that is commonly used for synaptic weights, whereas I understand this to be an attentional modulation vector with the same dimensionality as the grid cell code. It seems more similar to a neural network bias input than a weight matrix.

      We refer to the first paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (7) Page 4 - "parameterization of w... by two loss functions over the training set." - I realize that this has been stated here, but to emphasize the significance to a naïve reader, I think they should emphasize that the learning is entirely focused on the initial training space, and there is NO training done in the test spaces. It's very impressive that the parameterization is allowing generalization to translated or scaled spaces without requiring ANY training on the translated or scaled spaces.

      We have added the sentence “Note that learning of parameter occurs only over the training space and is not further modified during testing (i.e. over the test spaces)” to the updated manuscript.

      (8) Page 4 - "The first," - This should be specific - "The first loss function"

      We have changed it to “The first loss function” in the updated manuscript.

      (9) Page 4 - The analogy task seems rather simplistic when first presented (i.e. just a spatial translation to different parts of a space, which has already been shown to work in simulations of spatial behavior such as Erdem and Hasselmo, 2014 or Bush, Barry, Manson, Burgess, 2015). To make the connection to analogy, they might provide a brief mention of how this relates to the analogy space created by word2vec applied to traditional human verbal analogies (i.e. king-man+woman=queen).

      We agree that the analogy task is simple, and recognize that grid cells can be used to navigate to different parts of space over which the test analogies are defined when those are explicitly specified, as shown by Erdem and Hasselmo (2014) and Bush, Barry, Manson, and Burgess (2015). However, for the analogy task, the appropriate set of grid cell embeddings must be identified that capture the same relational structure between training and test analogies to demonstrate strong OOD generalization, and that is achieved by the attentional mechanism DPP-A. As suggested by the reviewer’s comment, our analogy task is inspired by Rumelhart’s parallelogram model of analogy [1,2] (and therefore similar to traditional human verbal analogies) in as much as it involves differences (i.e A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript.

      (1) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (2) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (10) Page 5 - The variable "KM" is a bit confusing when it first appears. It would be good to re-iterate that K and M are separate points and KM is the vector between these points.

      We apologize for the confusion on this point. KM is meant to refer to an integer value, obtained by multiplying K and M, which is added to both dimensions of A, B, C and D, which are points in ℤ2, to translate them to a different region of the space. K is an integer value ranging from 1 to 9 and M is also an integer value denoting the size of the training region, which in our implementation is 100. We have clarified this in Section 2.1.1 of the updated manuscript.

      (11) Page 5 - "two continuous dimensions (Constantinescu et al._)" - this ought to give credit to the original study showing the abstract six-fold rotational symmetry for spatial coding (Doeller, Barry and Burgess).

      We have now cited the original work by Doeller et al. [1] along with Constantinescu et al. (2016) in the updated manuscript after the phrase “two continuous dimensions” on page 5.

      (1) Doeller, C.F., Barry, C. and Burgess, N., 2010. Evidence for grid cells in a human memory network. Nature, 463(7281), pp.657-661.

      (12) Page 6 - Np=100. This is done later, but it would be clearer if they right away stated that Np*Nf=900 in this first presentation.

      We have now added this sentence after Np=100. “Hence Np*Nf=900, which denotes the number of grid cells.”

      (13) Page 6 - They provide theorem 2.1 on the determinant of the covariance matrix of the grid code, but they ought to cite this the first time this is mentioned.

      We have cited Gilenwater et al. (2012) before mentioning theorem 2.1. The sentence just before that reads “We use the following theorem from Gillenwater et al. (2012) to construct :”

      (14) Page 6 - It would greatly enhance the impact of the paper if they could give neuroscientists some sense of how the maximization of the determinant of the covariance matrix of the grid cell code could be implemented by a biological circuit. OR at least to show an example of the output of this algorithm when it is used as an inner product with the grid cell code. This would require plotting the grid cell code in the spatial domain rather than the 900 element vector.

      We refer to our response above to the topic “Biological plausibility of DPP-A” and second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contain our responses to this issue.

      (15) Page 6 - "That encode higher spatial frequencies..." This seems intuitive, but it would be nice to give a more intuitive description of how this is related to the determinant of the covariance matrix.

      We refer to the third paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (16) Page 7 - log of both sides... Nf is number of frequencies... Would be good to mention here that they are referring to equation 6 which is only mentioned later in the paragraph.

      As suggested, we now refer to Equation 6 in the updated manuscript. The sentence now reads “This is achieved by maximizing the determinant of the covariance matrix over the within frequency grid cell embeddings of the training data, and Equation 6 is obtained by applying the log on both sides of Theorem 2.1, and in our case where refers to grid cells of a particular frequency.”

      (17) Page 7 - Equation 6 - They should discuss how this is proposed to be implemented in brain circuits.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      18) Page 9 - "egeneralize" - presumably this is a typo?

      Yes. We have corrected it to “generalize” in the updated manuscript.

      (19) Page 9 - "biologically plausible encoding scheme" - This is valid for the grid cell code, but they should be clear that this is not valid for other parts of the model, or specify how other parts of the model such as DPP-A could be biologically plausible.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (20) Page 12 - Figure 7 - comparsion to one-hots or smoothed one-hots. The text should indicate whether the smoothed one-hots are similar to place cell coding. This is the most relevant comparison of coding for those knowledgeable about biological coding schemes.

      Yes, smoothed one-hots are similar to place cell coding. We now mention this in Section 5.3 of the updated manuscript.

      (21) Page 12 - They could compare to a broader range of potential biological coding schemes for the overall space. This could include using coding based on the boundary vector cell coding of the space, band cell coding (one dimensional input to grid cells), or egocentric boundary cell coding.

      We appreciate these useful suggestions, which we now mention as potentially valuable directions for future work in the second paragraph of Section 6 of the updated manuscript.

      (22) Page 13 - "transformers are particularly instructive" - They mention this as a useful comparison, but they might discuss further why a much better function is obtained when attention is applied to the system twice (once by DPP-A and then by a transformer in the inference module).

      We refer to the last paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (23) Page 13 - "Section 5.1 for analogy and Section 5.2 for arithmetic" - it would be clearer if they perhaps also mentioned the specific figures (Figure 4 and Figure 6) presenting the results for the transformer rather than the LSTM.

      We have now rephrased to also refer to the figures in the updated manuscript. The phrase now reads “a transformer (Figure 4 in Section 5.1 for analogy and Figure 6 in Section 5.2 for arithmetic tasks) failed to achieve the same level of OOD generalization as the network that used DPP-A.”

      (24) Page 14 - "statistics of the training data" - The most exciting feature of this paper is that learning during the training space analogies can so effectively generalize to other spaces based on the right attention DPP-A, but this is not really made intuitive. Again, they should illustrate the result of the xT w inner product to demonstrate why this work so effectively!

      We refer to the second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (25) Bibliography - Silver et al., go paper - journal name "nature" should be capitalized. There are other journal titles that should be capitalized. Also, I believe eLife lists family names first.

      We have made the changes to the bibliography of the updated manuscript suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      More details should be provided in terms of inclusion and exclusion criteria for the participants, as well as missing data due to the non-cooperation of newborns during the experimental process. Potential differences between preterm and full-term infants are worth exploring. Several aspects of EEG data analyses and data interpretation should be better clarified.

      Here I have several comments and questions to improve the manuscript.

      (1) It would be wise to know whether there was any missing data due to the non-cooperation of newborns during the experimental process.

      Thank you for the suggestion. While our initial aim was to include 120 neonates in the final data analysis, we actually recruited 198 neonatal participants for this study. The 78 EEG datasets were excluded from the data analysis due to non-cooperation of neonates (n = 75) or technical issues (n = 3). We have incorporated this detailed information in the Subjects subsection (lines 375-383) in the revised manuscript.

      (2) The authors investigated the impact of gestational age on emotional perceptual sensitivity in newborns by grouping infants of varying gestational ages in the experiment. The methods section mentions that the study conducted experiments within 24 hours after the birth of the newborns. When do preterm infants (with a gestational age of 35 and 36 weeks) begin to exhibit emotional discrimination comparable to full-term newborns? 

      This is indeed an intriguing question that merits exploration. However, in our study, we recruited relatively healthy preterm neonates, many of whom were discharged from the hospital with their mothers within 3-5 days after birth. It would have been challenging to arrange for another EEG testing session once these preterm infants reached full-term age, as their parents were unwilling to return to the hospital.

      (3) When analyzing EEG data, excluding artifacts with peak deviations exceeding ±200 μV is a relatively lenient criterion, potentially resulting in the retention of some large-amplitude artifacts or noise. What is the rationale behind the author's choice of this criterion? Or, in other words, what considerations led to this specific selection?

      In our standard practice, we typically employ a stricter threshold of ±100 μV for artifact removal in studies involving healthy adults and a median threshold of ±150 μV for data from adult patients, such as those with schizophrenia. However, when analyzing neonatal data, we often resort to the loosest criterion of ±200 μV. This decision is primarily due to the inherent challenges associated with neonatal EEG recordings, as we cannot expect newborns to cooperate or remain quiet during the recording process. Consequently, neonatal EEG data tend to contain more artifacts compared to those from healthy adults. Furthermore, the excitability of the newborn brain is notably elevated. This heightened excitability arises from an imbalance in the distribution and function of excitatory and inhibitory neurotransmitter systems. Typically, the expression of excitatory neurotransmitters and their receptors surpasses that of inhibitory neurotransmitters, resulting in increased excitability in the immature brain. This heightened excitability can occasionally lead to the occurrence of paroxysmal electrical activity. As a result, neonatal EEG recordings may at times display large amplitudes, exceeding even 100 μV. In this revision, we have referenced other neonatal/infant EEG studies or technique pipelines that have used the threshold of ±200 μV to support this criterion (lines 483-484).    

      (4) In the Discussion section, the authors mentioned the biomarkers, such as the fusiform gyrus and hippocampus, which have been identified as potential predictors of autism risk. It is suggested that the authors briefly elucidate the crucial role of these biomarkers in processing social information, which would enhance the readability and logicality of this manuscript.

      Thank you for the thoughtful suggestion. We have expanded the discussion concerning the involvement of the fusiform gyrus and hippocampus in social information processing (lines 314-319).

      Reviewer #2 (Public Review):

      First, readers need to see spectrograms that show the 0-4000 Hz in more detail, rather than what is now shown (0-10,000 Hz). The vocal signals in clearer spectrograms will show I believe the initial consonant burst and formant frequencies that are unique to human speech and give rise to the perception of the consonant sounds in the vocal signals like 'dada' and 'tutu' that were tested. The control signals will presumably not show these abrupt acoustic changes at their onset, even though they appear (from the oscillograms) to approximate the amplitude envelope. The primary cue distinguishing the happy and neutral signals in both the vocal and control signals is the pitch of the signals (high vs low), but the burst of energy representing the consonants is only contained in the vocal signals; it has no comparable match in the control signals. It is possible that the presence of a sharp acoustic onset (a unique characteristic of consonants in human speech) is especially alerting to the infants, and that this acoustic cue, in the context of the pitch change, enhances discrimination in the vocal case. One way to test this would be to use only vowel sounds to represent the vocal signals, without consonants.

      Thank you for your expert comments and considerations. We have redrawn Figure 3 using Praat software with a frequency range of 0-5000 Hz, as suggested by Praat’s default parameters. Based on the spectrograms, we acknowledge the potential role of consonants in accounting for differences in stimuli. Consequently, we have included this consideration as one of the limitations of our study in this revised version (lines 325-330).

      Another critical detail that the authors need to include about the signals is an explanation of how the control signals were generated. The text states that the Fo and amplitude envelope of the vocal signals were mimicked in the control signals, but what was the signal used for the controls? Was a pure tone complex modulated, or was pink noise used to generate the control signals? Or were the original vocal signals simply filtered in some way to create the controls, which would preserve the Fo and amplitude envelope? If merely filtered, the control signals still may be perceived as 'vocal' signals, rather than as nonspeech (the Supplement contains the sounds, and some of the control sounds can be perceived, to my ear, as 'vocal' signals).

      We sincerely appreciate your attention to detail regarding the generation of control signals. As a non-specialized laboratory in audio editing, our approach involved filtering the original vocal sounds around the fundamental frequency (f0) and ensuring a balanced mean intensity between vocal and nonvocal stimuli (as now stated in lines 432-437). However, it became evident that certain “vocal” components persisted in the control sounds, particularly noticeable in the sound “tutu”. In this revision, we openly acknowledge this oversight (lines 331-333). We extend our gratitude once again for highlighting the importance of meticulous consideration when generating control sounds for a study.

      Second, there is no information in the manuscript or supplement about the auditory environment of the participants, nor discussion of the fetus' ability to hear in the womb. In the womb, infants are listening to the mothers' bone-conducted speech (which is full of consonant sounds), and we know from published studies that infants can discern differences not only in the prosody of the speech they hear in the womb, but the phonetic characteristics of the mother's speech. The ability at 37 weeks GA or beyond to discriminate the pitch changes in the vocal, but not control signals, could thus be due to additional experience in utero to speech. Another experiential explanation is that the infants born at 37 weeks GA and beyond may be exposed to greater amounts of speech after birth, when compared to those born at 35 and 36 weeks GA, from the attending nurses and from their caregivers, and this speech is also full of consonant sounds. What these infants hear is likely to be 'infant-directed speech,' which is significantly higher in pitch, mirroring the signals tested here. At 37 weeks GA, infants are likely more robust, may sleep less, and are likely more alert. If infants' exposure to speech, either after birth, or their auditory ability to discern differences in speech in utero, is enhanced at 37 weeks GA and beyond, then an 'experience-related' explanation is a viable alternative to a maturational explanation, and should be discussed. Perhaps both are playing a role. As the authors state, many more signals need to be tested to discern how the effect should be interpreted, and other viable interpretations of the current results discussed.

      We acknowledge the importance of considering the auditory environment of participants and the fetus' ability to hear in the womb. In our study, neonates were exposed to a native language environment both before and after birth (as added in lines 385-386), and we took efforts to minimize their exposure to speech stimuli other than those used in the experiment. Specifically, all neonates participated the experiment and underwent EEG recording within the first 24 hours after birth (lines 386-387). They were promptly transported to a dedicated testing room for EEG recording as soon as their condition stabilized after birth. During recording sessions, they were separated from their mothers to minimize exposure to natural speech (as added in lines 459-461). As a result, we believe that both preterm and term neonates were exposed to comparable amounts of speech after birth and before the experiment. We also ensured that all participants were in a natural sleep state during EEG recording. However, it is possible that term neonates slept less and were more attentive to the limited speech stimuli in their environment before the experiment compared to preterm newborns.

      The debate surrounding nature versus nurture in neonate and infant development persists. We recognize the potential impact of prenatal auditory experiences on neonatal perceptual sensitivity. Therefore, we have added a brief discussion regarding innate- or experience-related explanations for emotional prosodic discrimination in neonates, aiming to shed light on future research directions (lines 343-351).

    1. Author response:

      The following is the authors’ response to the previous reviews

      It is unclear to us why you did not adjust the title to better reflect the well-supported claims of the paper, i.e., that this is a valuable model for human loss-of-function mutations in IQCH.

      Thanks for the editor’s suggestion. We have changed the title to “Deficiency of IQCH causes male infertility in humans and mice.” Additionally, we have provided the original images of the gels or blots as a zipped folder.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors explore ER stress signalling mediated by ATF6 using a genome-wide gene depletion screen. They find that the ER chaperone Calreticulin binds and directly represses ATF6; this proposed function for Calreticulin is intriguing and constitutes an important finding. The evidence presented is based on CHO genetic evidence and biochemical results and is convincing. 

      We thank the editors for their favourable assessment of our work.

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Tung and colleagues identify Calreticulin as a repressor of ATF6 signalling using a CRISPR screen and characterize the functional interaction between ATF6 and CALR. 

      Strengths: 

      The manuscript is well written and interesting with an innovative experimental design that provides some new mechanistic insight into ATF6 regulation as well as crosstalk with the IRE1 pathway. The methods used were fit for purpose and reasonable conclusions were drawn from the data presented. Findings are novel and bring together glycoprotein quality control and activation of one sensor of the UPR. This is a novel perspective on how the integration of ER homeostasis signals could be sensed in the ER. 

      We thank the reviewer for their favourable assessment of our work.

      Weaknesses: 

      Several points remain to be documented to support the authors' model. 

      Major comments 

      (1) It is interesting that BiP, PDIs, and COPII are not identified in the screen. Might this indicate some bias in the system perhaps limiting its sensitivity or pleiotropic effects of the reporter? 

      The reviewer raises a valid concern. Our CRISPR screen aimed to identify genes that selectively modulate ATF6⍺. Therefore, we excluded from consideration genes whose inactivation had effects on the broader ER environment. This would disfavour the selection of genes encoding BiP, PDI and COPII components. Additionally, a positive selection screen inherently removes essential genes like BiP. The absence of COPII components among the hits could be due to essentiality or that those components are not strong selective modulators for ATF6⍺ activation, as the stronger ATF6⍺ modulators as S1P, S2P and transcription factor S2P and NFY were among our top hits. Cell type specificity may also play a role. For example, ERp18, a small PDI previously implicated in ATF6⍺ activation (Oka et al 2019; PMID: 31368601), despite the presence of sgRNAs targeting hamster ERp18 in the library. Interestingly, depletion of ERp18 in our dual UPR reporter CHO-K1 cell line did not affect the ATF6⍺ and IRE1⍺ UPR branches in CHO-K1 cells. This new information has been incorporated into the revised manuscript as Supplemental Figure S6E and the discussion has been edited in line with these comments.

      (2) CLR interacts with ATF6 independently of ATF6 glycans (and cysteines). How do the authors reconcile this observation with the lectin functions of CALR? What is the interaction mode then - if the CALR N (lectin) domain is not involved, is it the P domain that is responsible for the interaction? All the binding experiments are performed in the presence of 1 mM CaCl2, is calcium necessary for CALR to achieve binding? 

      These points merit clarification. The Biolayer Interferometry (BLI) assay reported on an interaction between ATF6 and CRT that is independently of ATF6⍺ glycans. However, cellbased experiments revealed a contribution of glycan-dependent interactions to the binding and repression. Therefore, we conclude that the interaction of CRT with ATF6⍺ likely involves both lectin-dependent and lectin-independent interactions (dependent on the P-domain). Indeed, this hybrid model has previously been suggested as the mode of stable interaction of CRT with other substrates, as cited in the discussion section (Wijeyesakere et al., 2013; PMID: 24100026). CRT is a known calcium-dependent protein, and all the in vitro experiments were conducted in the presence of 1 mM CaCl2. We do not have data from experiments without CaCl2.

      (3) Does the introduction of the reporter system affect the normal BiP (or ATF6) protein levels in the cells? 

      To address this question, we have conducted new experiments comparing endogenous BiP protein levels between the reporter-containing cells and the parental CHO-K1 cells using immunoblotting and an anti-BiP antibody. These data indicate that the reporter system does not affect to the endogenous BiP protein levels. This new information has been incorporated as revised Supplemental Figure S1C.

      (4) Does the depletion of CRT affect BiP interaction with ATF6? The absence of CRT may lead to misfolding of glycoproteins and titration of BiP away from ATF6 leading to activation. An indicator of ER stress levels that is independent of ATF6 and IRE1 might be useful. 

      To further assess ER stress levels in CRT-depleted cells, we compared expression levels of endogenous ER resident proteins containing a KDEL signal (e.g., P3H1, GRP94, BiP and PDI) in parental CHO-K1 cells, dual UPR reporter cell lines (XC45-6S) and CRT-depleted cells (CRT∆#2P) under basal conditions and during ER stress by immunoblotting. This comparison confirmed the basal elevation in BiP protein level in cells lacking CRT, consistent with previous findings (Figure 2D) and more broadly the integrity of UPR signalling in cells lacking CRT. In the interest of time, we did not extend the analysis to other branches of the UPR. This new information has been incorporated as Supplemental Figure S5 and in the text of the revised manuscript.

      (5) Does CALR depletion alter ATF6 redox status. 

      We thank the reviewer for raising this interesting point. In response, we compared ATF6⍺ redox status in parental and CRT-depleted cells using non-reducing SDS-PAGE. Overall, the redox pattern was similar in parental and CRT-depleted cells with the detection of two redox forms: an inter-chain disulfide-stabilised dimer and the monomer. Under basal conditions, ATF6⍺ predominantly existed as a monomer, while under ER stress, the monomer band decreased with a corresponding increase in a disulfide-stabilised dimer form in parental cells, as previously reported (Oka et al, 2022; PMID: 35286189). However, under ER stress, CRTdepleted cells showed a significantly higher fraction of monomer versus dimer compared to parental cells. Taking all together, these data suggest that the loss of CRT may favour the monomeric form of ATF6α, which is proposed to be more efficiently trafficked (Nadanaka, et al 2007; PMID: 17101776), aligning with our observations that CRT depletion is associated to constitutive activation of ATF6α. These new data have been included as Supplemental Figure S7 and are detailed explained in the results section of the revised manuscript.

      (6) Figure 4C would benefit from some immunoblotting against BiP.

      Although we acknowledge the validity of this suggestion and understand the referee's interest in comparing the amount of CRT in pulldown with that of BiP, the necessity of generating additional samples makes this experiment impractical. Consequently, we opted not to include in our conclusion any comparison regarding the retention of ATF6α by BiP relative to CRT.

      (7) Overlooked requirement of cysteines for ATF6 functionality (Figure 5B). 

      We interpret this comment to refer to the inactivity of the cysteine-free allele of ATF6⍺. Whilst this is a reproducible observation of significance to the structure-activity features of ATF6⍺’s luminal domain, it is less informative in terms of understanding trans-active regulators of ATF6⍺ and was therefore not explored further.

      (8) Without a clear definition of the role of CRT in ATF6 folding, one cannot infer that the observed phenotype is not based on defects in ATF6 "folding" and glycosylation considering the possibility of activation of newly synthesised un-glycosylated ATF6. 

      If the main role of CRT were to assist ATF6⍺ folding, one would expect that depletion of CRT would lead to a non-functional ATF6⍺, resulting in ER retention and less activity. However, our data indicate that the loss of CRT correlates with the constitutive activation of the ATF6⍺ fluorescent reporter and increased Golgi trafficking and processing of ATF6⍺. Therefore, these data suggest that in CRT-depleted cells, the majority of ATF6⍺ is likely to fold to a functional state.

      (9) ATF6 was defined in several studies as a natively unstable protein and shows a close relationship with the ERAD machinery, is the role of CALR also involved in a quality control mechanism for natively unfolded ATF6? 

      The reviewer brings up a valid point too. Although we have not closely evaluated the role of CRT in the quality control machinery, we observed that the loss of CRT was not associated with an increased levels of ATF6⍺ in CRT depleted cells in basal conditions compared with parental cells (Fig 3B.1, compare line 1 and line 7; Figure 3B.2, compare line 1 and line 5). These observations suggest that if ATF6⍺ were degraded by ERAD and loss of CRT compromised ERAD functionality, CRT-depleted cells should exhibit increased levels of endogenous ATF6⍺. The fact that endogenous ATF6⍺ levels are slightly reduced in CRT depleted cells does not support a role for CRT in the quality control mechanism for natively unfolded ATF6⍺.

      (10) C618 in ATF6 is located within the BiP binding site and in close proximity of an Nglycosylation site. Is this region of particular importance for CALR binding? 

      It is an interesting point that we have not explored in this study. Consequently, without experimental data, we cannot infer the possible implications of C618 in CRT binding.

      (11) The authors have mutated all the N glycosylation sites at once; they should be mutated one by one and the impact on ATF6 stability evaluated independently of the CALR status. 

      We agree that analysing each N-glycosylation site individually would provide further insight into their contributions to ATF6⍺ stability/functionality. However, given the scope of the paper in its present form we have elected not to addressing this point.

      (12) The relationship between the absence of CALR and IRE1 remains weak. The authors do not exclude the possibility that CALR could have a direct effect on IRE1 itself. This should be either removed or further investigated. 

      We beg to differ. The relationship between the absence of CRT and IRE1 is not weak; loss of CRT in CHO-K1 cells represses IRE1; we conceded readily that the relationship is incompletely understood. ATF6⍺ signalling involves crosstalk with the IRE1 pathway, partly mediated by direct heterodimerisation of N-ATF6⍺ with XBP1s (Yamamoto et al., 2007, 2004). Additionally, recent research has shown that ATF6⍺ activity can repress IRE1 signalling (Walter et al., 2018). Therefore, given that our results indicate that the loss of CRT leads to constitutive activation of ATF6⍺, we suggest that a negative feedback loop in which ATF6⍺ represses IRE1 contributes to the observations made here on the relationship between CRT and IRE1. This does not exclude other aspects to the relationship, a point that is now clarified further in the revised manuscript. 

      Minor point 

      In the introduction on page 3 it is mentioned that loss of ATF6 impairs survival in cellular and animal models, this is not completely true as ATF6a ko in mice has no clear deleterious phenotype and only the double ko ATF6a/b has some dramatic impact.

      We have modified that sentence on the revised manuscript. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, the authors set out to use an unbiased CRISPR/Cas9 screen in CHO cells to identify genes encoding proteins that either increase or repress ATF6 signalling in CHO cells. 

      Strengths: 

      The strengths of the paper include the thoroughness of the screens, the use of a novel, double ATF6/IRE1 UPR reporter cell line, and follow-up detailed experiments on two of the findings in the screens, i.e. FURIN and CRT, to test the validity of involvement of each as direct regulators of ATF6 signalling. Additional strengths are the control experiments that validate the ATF6 specificity of the screens, as well as, for CRT, the finding of focus, determining roles for the glycosylation and cysteines in ATF6 as mechanistically involved in how CRT represses ATF6, at least in CHO cells. 

      We thank the reviewer for their favourable assessment of our work.  

      Weaknesses: 

      (1) The weaknesses of the paper are that the authors did not describe why they focused only on the top 100 proteins in each list of ATF6 activators and repressors. 

      We concede that the more genes one studies the better. However, In whole genome CRISPR screens where thousands of hits arise, it is a common practise that researchers prioritise candidates with the greatest significant as those genes are likely to have a more meaningful impact on the phenotype under investigation. Therefore, our decision to focus on the top 100 genes was based on a desire to identify the most prominent and potentially impactful candidates for further analysis, ensuring a manageable scope for in-depth study while maintaining a measure of relevance and significance. Moreover, setting the threshold at 100 hits to perform GEO enrichment analysis is a practise used by previous researchers (PMID: 30323222; PMID: 37251921). In our case, the top 100 hits included the genes with an adjusted P < 0.005. For interested readers, the full ranked list is accessible in the GEO databank (GSE254745) and as supplemental Table S1.

      (2) Additionally, there were a few methodology items missing, such as the nature of where the insertion site in the CHO cell genome of the XBP1::mCherry reporter. Since the authors go to great lengths to insert the other reporter for ATF6 activation in a "safe harbor" location, it leads to questions about whether the XBP1::mCherry reporter insertion is truly innocuous. 

      We appreciate the opportunity to clarify certain aspects of our experimental procedures. In order to generate a double UPR reporter cell line, we employed a previously established the XC45 CHO-K1 clone with an integrated XBP1s::mCherry reporter (Harding et al., 2019; PMID: 31749445). Since the ROSA26 safe harbor locus was available in the XC45 CHO-K1 cell line, we directed integrated the ATF6⍺ reporter there. To provide further clarity, the revised manuscript includes additional details in the Methods section regarding the creation of the XBP1 reporter.

      (3) An additional weakness is that the evidence for the physical interaction between ATF6LD and CRT is not strong, being dependent mainly on a single IP/IB experiment in Figure 4C that comprises only 1 lane on the gel for each of the test cases. Moreover, while that figure suggests that the interaction between CRT and ATF6 is decreased by mutating out the glycosylation sites in the ATF6LD, the BLI experiment in the same figure, 4B, suggests that there are no differences in the affinities of CRT for ATF6LD WT, deltaGly and deltaCys. 

      We would like to highlight that in the IP/IB experiments (see Figure 4C), where wildtype ATF6 (ATF6⍺_LDWT) and GFP-ATF6_LD∆Gly were transiently transfected, GFP-ATF6_LD∆Gly was expressed at lower levels than ATF6⍺_LDWT. This lower expression levels might explain why CRT is more prominently immunoprecipitated with ATF6⍺_LDWT and could account for the differences observed among in vitro and in vivo assays.

      (4) An additional detail is that I found Figure 6A to be difficult to interpret, and that 6B was required in order for me to best evaluate the points being made by the authors in this figure. 

      We have simplified Figure 6A in the revised manuscript to make it more interpretable by focussing the reader’s attention on the transfected population. 

      Overall, I believe that this work will positively impact the field as it provides a list of potential regulators of ATF6 activation and repression that others will be able to use as a launch point for discovering such interactions in cells and tissues or interest beyond CHO cells. However, I agree with the authors that these findings were in CHO cell lines and that it is possible, if not likely, that some of the interactions they found will be cell type/line specific. 

      We accept this point and re-emphasize the qualification that our conclusions cannot be glibly extrapolated to other cell lines.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews: 

      Reviewer #1 (Public Review): 

      The goal of the current study was to evaluate the effect of neuronal activity on blood-brain barrier permeability in the healthy brain, and to determine whether changes in BBB dynamics play a role in cortical plasticity. The authors used a variety of well-validated approaches to first demonstrate that limb stimulation increases BBB permeability. Using in vivo-electrophysiology and pharmacological approaches, the authors demonstrate that albumin is sufficient to induce cortical potentiation and that BBB transporters are necessary for stimulus-induced potentiation. The authors include a transcriptional analysis and differential expression of genes associated with plasticity, TGF-beta signaling, and extracellular matrix were observed following stimulation. Overall, the results obtained in rodents are compelling and support the authors' conclusions that neuronal activity modulates the BBB in the healthy brain and that mechanisms downstream of BBB permeability changes play a role in stimulus-evoked plasticity. These findings were further supported with fMRI and BBB permeability measurements performed in healthy human subjects performing a simple sensorimotor task. There is literature to suggest that there are sex differences in BBB dysfunction in pathophysiological conditions and the authors have acknowledged the use of only males as a minor limitation of the study that should be addressed in the future. Future studies should also test whether the upregulation of OAT3 plays a role in cortical plasticity observed following stimulation. Overall, this study provides novel insights into how neurovascular coupling, BBB permeability, and plasticity interact in the healthy brain. 

      Reviewer #2 (Public Review): 

      Summary: 

      This study builds upon previous work that demonstrated that brain injury results in leakage of albumin across the blood brain barrier, resulting in activation of TGF-beta in astrocytes. Consequently, this leads to decreased glutamate uptake, reduced buffering of extracellular potassium and hyperexcitability. This study asks whether such a process can play a physiological role in cortical plasticity. They first show that stimulation of a forelimb for 30 minutes in a rat results in leakage of the blood brain barrier and extravasation of albumin on the contralateral but not ipsilateral cortex. The authors propose that the leakage is dependent upon neuronal excitability and is associated with an enhancement of excitatory transmission. Inhibiting the transport of albumin or the activation of TGF-beta prevents the enhancement of excitatory transmission. In addition, gene expression associated with TGF-beta activation, synaptic plasticity and extracellular matrix are enhanced on the "stimulated" hemisphere. That this may translate to humans is demonstrated by a break down in the blood brain barrier following activation of brain areas through a motor task. 

      Strengths: 

      This study is novel and the results are potentially important as they demonstrate an unexpected break down of the blood brain barrier with physiological activity and this may serve a physiological purpose, affecting synaptic plasticity. 

      The strengths of the study are: 

      (1) The use of an in vivo model with multiple methods to investigate the blood brain barrier response to a forelimb stimulation. 

      (2) The determination of a potential functional role for the observed leakage of the blood brain barrier from both a genetic and electrophysiological view point 

      (3) The demonstration that inhibiting different points in the putative pathway from activation of the cortex to transport of albumin and activation of the TGF-beta pathway, the effect on synaptic enhancement could be prevented.  (4) Preliminary experiments demonstrating a similar observation of activity dependent break down of the blood brain barrier in humans. 

      Weaknesses: 

      The authors adequately addressed most of my points. A few remain: 

      (1) Although the reviewers have addressed the possible effects of anaesthesia on neuro-vascular coupling. They have not mentioned or addressed the possible effects of ketamine (an NMDA receptor antagonist) on synaptic plasticity. Indeed, the low percentage of SEP increase following potentiation (10-20%) could perhaps be explained by partial block of NMDA receptors by ketamine.

      We agree and apologize for this oversight. This important issue is now addressed in the Discussion.

      “Notably, the antagonistic effect of ketamine on NMDA receptors might attenuate the magnitude of SEP potentiation recorded in our experiments (Anis et al., 1983; Salt et al., 1988).”

      (2) The experimental paradigms remain unclear to me. Now, it appears that drugs are applied for 50 minutes and that the stimulation occurs during the "washout period". The more conventional approach would be to have the drug application during the stimulation period to determine if the drugs occlude or enhance the effects of stimulation and then washout the drugs. The problem is that drugs variably washout at different rates depending upon their lipid solubility.

      We agree that the more conventional approach would have been to continue applying the drug throughout the experiment and that differential rates of washout may add variability to our experiments. However, despite this limitation, within each treatment group we found that the SEP response at 50 minutes (immediately after the drug application window) does not differ from SEP response at 80 minutes (after 30 minutes of stimulation and washout) [Figure 3H&G]. This suggests that the drug effects were still present despite terminating drug application and performing potentiation-inducing stimulation. Moreover, our analysis showed that animals within each treatment group (except AP5) had similar SEP responses with little intra-group variability.

      (3) It is still not clear to what extent the experimenters and those doing the analysis were blinded to group. If one or both were blind to group, then please put this in the methods.

      Thank you for this comment. We revised the Methods section to clearly confirm that data was collected and analyzed blindly.  

      Reviewer #3 (Public Review): 

      Summary: 

      This study used prolonged stimulation of a limb to examine possible plasticity in somatosensory evoked potentials induced by the stimulation. They also studied the extent that the blood brain barrier (BBB) was opened by the prolonged stimulation and whether that played a role in the plasticity. They found that there was potentiation of the amplitude and area under the curve of the evoked potential after prolonged stimulation and this was long-lasting (>5 hrs). They also implicated extravasation of serum albumin, caveolae-mediated transcytosis, and TGFb signalling, as well as neuronal activity and upregulation of PSD95. Transcriptomics was done and implicated plasticity related genes in the changes after prolonged stimulation, but not proteins associated with the BBB or inflammation. Next, they address the application to humans using a squeeze ball task. They imaged the brain and suggest that the hand activity led to an increased permeability of the vessels, suggesting modulation of the BBB. 

      Strengths: 

      The strengths of the paper are the novelty of the idea that stimulation of the limb can induce cortical plasticity in a normal condition, and it involves opening of the BBB with albumin entry. In addition, there are many datasets and both rat and human data. 

      Weaknesses: 

      The conclusions are not compelling however because of a lack of explanation of methods.

      In the revised paper, we added a section titled ‘study design’ that presents an overview of the experimental approach.

      The explanation of why prolonged stimulation in the rat was considered relevant to normal conditions should be as clear in the paper as it is in the rebuttal.

      We added a new paragraph to the Discussion section explaining this point as we did in the rebuttal:  

      “Our animal experiments show that a 30 min limb stimulation (at 6Hz and 2mA) increases cross-BBB influx, while a 1 min stimulation (of similar frequency and magnitude) does not. We believe that both types of stimulations fall within the physiological range because our continuous electrophysiological recordings showed no signs of epileptiform or otherwise pathological activity. Moreover, the recorded SEP levels were similar to those reported in previous physiological LTP studies in rats (Eckert & Abraham, 2010; Han et al., 2015; Mégevand et al., 2009) and humans (McGregor et al., 2016). In humans, skill acquisition often involves motor training sessions that last ≥30 minutes (Bengtsson et al., 2005; Classen et al., 1998) and result in physiological plasticity of sensory and motor systems (Classen et al., 1998; Draganski et al., 2004; Sagi et al., 2012). Hence, the experimental task in our human study (30 minutes of repetitive squeezing of an elastic stress-ball) is likely to represent physiological activity, with neuronal activation in primarily motor and sensory areas (Halder et al., 2005). Future human and animal studies are needed to explore the BBB modulating effects of additional stimulation protocols – with varying durations, frequencies, and magnitudes. Such studies may also elucidate the temporal and ultrastructural characteristics that differentiate between physiological and pathological BBB modulation. “

      The authors need to ensure other aspects of the rebuttal are as clear in the paper as in the rebuttal too. 

      Thank you for this comment. This was addressed in the revised paper.

      The only remaining concern that is significant is that it is hard to understand the figures. 

      Thank you for this comment. We revised the figures according to the reviewer’s recommendations. We hope that these changes increase the legibility of the figures. 

      Reviewer #3 (Recommendations For The Authors): 

      The manuscript is improved but there are still suggestions that do not appear to have been addressed. More experiments are not involved in addressing these concerns but one wants the paper to be clarified in terms of what was done. 

      Figures. Please use arrows to point to the effect that the reader should see. Please note what the main point is. 

      Major concerns: 

      Please add explanations, exact p values, and other revisions in the rebuttal to the paper. 

      Rebuttal explanations were added to the paper and p values appear in figure legends.

      Fig 1d shows a seizure-like event which the authors don't think is a seizure because it lacks a depolarization ship. This explanation is not convincing because a LFP would not necessarily show a depolarization ship. Another argument of a discussion of the event as a seizure is warranted. Note that expanding the trace might also show it is unlike a seizure. Regarding the idea that 6Hz 2 mA stimuli for 30 min are physiological, the authors make three arguments which are not clear. First, no epileptiform activity was found, but in Fig. 1 it looks like a seizure occurred. Second, memory and skill acquisition in humans open involve a similar training duration - but what about 6Hz 2 mA?

      Rats are known to rhythmically move their whiskers at frequencies ranging between 5 and 15 Hz (Mégevand et al., 2009). We agree that there is no clear way to justify the similarity between the experimental design in humans and rats. However, we believe that both paradigms (paw stimulation in rats and ball squeeze in humans) represent non-pathological input that we found to modulate barrier permeability. This argument was added to the discussion of the paper:

      “We believe that both types of stimulations fall within the physiological range because in rats, activity between 515 Hz represents physiological rhythmic whisker movement during environment exploration (Mégevand et al., 2009).” 

      Seizures are typically induced in rats via direct tetanic stimulation of the brain (at 50 Hz and 0.3-2.5mA) or maximal electroshock test to the cornea (at 50 Hz and 150 mA) (Swinyard et al., 1952). We, therefore, assert that the activity we observe represents physiological responses and not seizures. This argument is beyond the scope of the current paper. 

      Please note a limitation is that the high level of serum albumin is unlikely to be physiological but may not have been as high in the animal because of the low diffusion rate and degradation (please add the refs in the rebuttal). 

      Thank you, we added the following to the Results section: 

      “The relatively high concentration of albumin was chosen to account for factors that lower its effective tissue concentration such as its low diffusion rate and its likelihood to encounter a degradation site or a cross-BBB efflux transporter (Tao & Nicholson, 1996; Zhang & Pardridge, 2001).”

      Fig. 1. 

      Please consider a box in b to show where the expanded traces in the lower row came from. 

      Thank you for the suggestion. We added lines indicating where the trace excerpts were taken from.

      c. Please use arrows to point to the parts that the authors want the reader to note. In the legend, explain what t is, and delta HbT.

      Thank you. We implemented this suggestion.

      d. It is not clear what the double-sided arrows are meant to show compared to the arrow without two sides. 

      We replaced the two-headed arrow with two single ones.

      e. Please explain what the upward lines at the top signify. What does the red asterisk mean? 

      Thank you. We implemented this suggestion.

      f. Is the reader supposed to note the yellow area? Please make it with an arrow or circle if so. 

      Thank you, we added a white circle to mark the area of tracer accumulation.

      g. Please explain what the permeability index is or reference the part of the paper that does. 

      Further to this suggestion, we added a refence to the appropriate methods section to the legend.

      h. Please use arrows to point to the area of interest. 

      Thank you. We implemented this suggestion.

      m-n. Please mark areas of interest with arrows.  m. the top right two images are unclear. I suggest making them say ipsi inset and contra inset instead of using asterisks. 

      Thank you. We added the ipsi and contra labels to panels in m. The images in panel n represent a phenomenon with no particular region of interest, but rather peri-vascular tracer accumulation along the entire depicted blood vessel. We clarified that panel n represents a separate experiment than panel m: “n. In an animal injected with both EB and NaFlu post stimulation, fluorescence imaging shows extravascular accumulation of both tracers along a cortical small vessel in the stimulated hemisphere.”

      Figure 2. 

      (2) a. Middle. What are the vertical lines at the top? The rebuttal states that was explained in the revised legends but I don't see it. 

      Our apologies. We now included an explanation that “an excerpt of the stimulation trace is shown above the middle LFP trace”.

      c and d are very different field potentials in shape and therefore hard to compare. The rebuttal addresses this but the explanation is not in the revised text. 

      We agree that there is variability in SEP responses between animals. We now added a statement acknowledging this in the methods section: “To overcome potential variability in SEP morphology between animals (Mégevand et al., 2009), each animal’s plasticity measures (max amplitude and AUC of post stimulation SEP) were compared to the same measures at baseline.” 

      In d, it is not clear there is potentiation because the traces are not aligned. 

      All panels depicting SEP traces represent raw data with no alignment. The shift observed in panel d exemplifies why we compare post-stimulation parameters of max amplitude and area under curve to baseline in each animal. 

      Exact P values are said to have been added in the rebuttal but they were not. 

      Exact P values appear in Figure legends.

      (3) b. Use arrows to mark the area of interest. 

      Thank you. We added a white circle to mark the area of tracer accumulation similar to Figure 1f.

      d. Why is there an oscillation superimposed on all traces except CNQX? 

      We agree that this is an interesting question. Future studies should determine the source of this SEP pattern.   

      (4) What does the line and the number 2 mean? How were data normalized? What was counted? What area of cortex?

      The number 2 refers to the scale bar line, meaning a log fold change of 2 reflects the size of the scale bar line. 

      The plot shows the log fold change against the mean count of each gene in the contralateral somatosensory cortex between 1 and 24 hours after stimulation.

      The x axis title was changed to “mean expression” and the legend was modified to:

      “Scatter plot of gene expression from RNA-seq in the contralateral somatosensory cortex 24 vs. 1 h after 30 min stimulation. The y axis represents the log fold change, and the x axis represents the mean expression levels (see methods, RNA Sequencing & Bioinformatics). Blue dots indicate statistically significant differentially expressed genes (DEGs) by Wald Test (n=8 rats per group).”

      How were the pericytes, smooth muscle cells, ,etc. distinguished? 

      This was explained under Methods->RNA Sequencing & Bioinformatics: “Analysis of cell-specific and vascular zonation genes was performed as described (Vanlandewijck et al., 2018), using the database provided in (http://betsholtzlab.org/VascularSingleCells/database.html).”

      What were the chi square statistics? If there were cells used instead of rats, please justify. 

      Thank you. The legend was expanded to include the following:

      “The contralateral somatosensory cortex was found to have a significantly higher number of DEGs related to synaptic plasticity, than the ipsilateral side (***p<0.001, Chi-square).”     

      (5) b. what do the icons mean? 

      We agree that the icons were confusing. We simplified this panel to just show when participants were asked to squeeze the ball (black icon). This explanation was added to the Figure legend.

      Abbreviations? 

      Abbreviations of MRI protocols were added to the figure legend for clarity.

      In c-e what are the units of measure? Fold-change? 

      The units represent t-statistics values for each voxel. The label ‘t-statistic’ was added to the figure.  

      What are the white Iines, + and - signs? 

      The white lines point to voxels of highest activation (t-statistic). This was added to the legend.

      And these are not +/- signs these are voxels with significant activation which only appear similar.

      f. Please explain f and g for clarity. 

      Thank you. The explanation was modified for added clarity.

      Supplemental Fig. 4. 

      Original question: If ipsilateral and contralateral showed many changes why do the authors think the effects were only contralateral? 

      The authors replied: Our gene analysis was designed to complement our in vivo and histological findings, by assessing the magnitude of change in differentially expressed genes (DEGs). This analysis showed that: (1) the hemisphere contralateral to the stimulus has significantly more DEGs than the ipsilateral hemisphere; and (2) the DEGs were related to synaptic plasticity and TGF-b signaling. These findings strengthen the hypothesis raised by our in vivo and histological experiments. 

      Could the authors clarify the answer to the question in the text? 

      Thank you. This section was added to the Discussion. 

      Papers referenced in this letter:

      Anis, N. A., Berry, S. C., Burton, N. R., & Lodge, D. (1983). The dissociative anaesthetics, ketamine and phencyclidine, selectively reduce excitation of central mammalian neurones by N-methyl-aspartate. British Journal of Pharmacology, 79(2), 565–575. hQps://doi.org/10.1111/j.1476-5381.1983.tb11031.x

      Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience, 8(9), 1148–1150. hQps://doi.org/10.1038/nn1516

      Classen, J., Liepert, J., Wise, S. P., Hallett, M., & Cohen, L. G. (1998). Rapid plasticity of human cortical movement representation induced by practice. Journal of Neurophysiology, 79(2), 1117–1123. hQps://doi.org/10.1152/JN.1998.79.2.1117/ASSET/IMAGES/LARGE/JNP.JA47F4.JPEG

      Draganski, B., Gaser, C., Busch, V., Schuierer, G., Bogdahn, U., & May, A. (2004). Changes in grey matter induced by training. Nature, 427(6972), 311–312. hQps://doi.org/10.1038/427311a

      Eckert, M. J., & Abraham, W. C. (2010). Physiological effects of enriched environment exposure and LTP induction in the hippocampus in vivo do not transfer faithfully to in vitro slices. Learning and Memory, 17(10), 480–484. hQps://doi.org/10.1101/lm.1822610

      Halder, P., Sterr, A., Brem, S., Bucher, K., Kollias, S., & Brandeis, D. (2005). Electrophysiological evidence for cortical plasticity with movement repetition. European Journal of Neuroscience, 21(8), 2271–2277. hQps://doi.org/10.1111/J.1460-9568.2005.04045.X

      Han, Y., Huang, M. De, Sun, M. L., Duan, S., & Yu, Y. Q. (2015). Long-term synaptic plasticity in rat barrel cortex. Cerebral Cortex, 25(9), 2741–2751. hQps://doi.org/10.1093/cercor/bhu071

      McGregor, H. R., Cashaback, J. G. A., & Gribble, P. L. (2016). Functional Plasticity in Somatosensory Cortex Supports Motor Learning by Observing. Current Biology, 26(7), 921–927. hQps://doi.org/10.1016/j.cub.2016.01.064

      Mégevand, P., Troncoso, E., Quairiaux, C., Muller, D., Michel, C. M., & Kiss, J. Z. (2009). Long-term plasticity in mouse sensorimotor circuits after rhythmic whisker stimulation. Journal of Neuroscience, 29(16), 5326– 5335. hQps://doi.org/10.1523/JNEUROSCI.5965-08.2009

      Sagi, Y., Tavor, I., HofsteQer, S., Tzur-Moryosef, S., Blumenfeld-Katzir, T., & Assaf, Y. (2012). Learning in the Fast Lane: New Insights into Neuroplasticity. Neuron, 73(6), 1195–1203. hQps://doi.org/10.1016/j.neuron.2012.01.025

      Salt, T. E., Wilson, D. G., & Prasad, S. K. (1988). Antagonism of N-methylaspartate and synapBc responses of neurones in the rat ventrobasal thalamus by ketamine and MK-801. British Journal of Pharmacology,

      94(2), 443–448. hQps://doi.org/10.1111/j.1476-5381.1988.tb11546.x

      Swinyard, E. A., Brown, W. C., & Goodman, L. S. (1952). Comparative assays of antiepileptic drugs in mice and rats. The Journal of Pharmacology and Experimental Therapeutics, 106(3), 319–330. hQp://jpet.aspetjournals.org/content/106/3/319.abstract

      Tao, L., & Nicholson, C. (1996). Diffusion of albumins in rat cortical slices and relevance to volume transmission. Neuroscience, 75(3), 839–847. hQps://doi.org/10.1016/0306-4522(96)00303-X

      Vanlandewijck, M., He, L., Mäe, M. A., Andrae, J., Ando, K., Del Gaudio, F., Nahar, K., Lebouvier, T., Laviña, B.,

      Gouveia, L., Sun, Y., Raschperger, E., Räsänen, M., Zarb, Y., Mochizuki, N., Keller, A., Lendahl, U., &

      Betsholtz, C. (2018). A molecular atlas of cell types and zonation in the brain vasculature. Nature, 554(7693), 475–480. hQps://doi.org/10.1038/nature25739

      Zhang, Y., & Pardridge, W. M. (2001). Mediated efflux of IgG molecules from brain to blood across the blood– brain barrier. Journal of Neuroimmunology, 114(1–2), 168–172. hQps://doi.org/10.1016/S01655728(01)00242-9

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Wang, He et al have constructed comprehensive single nucleus atlas for the gills of the deep sea Bathymodioline mussels, which possess intracellular symbionts that provide a key source of carbon and allow them to live in these extreme environments. They provide annotations of the different cell states within the gills, shedding light on how multiple cell types cooperate to give rise to the emergent functions of the composite tissues and the gills as a whole. They pay special attention to characterizing the bacteriocyte cell populations and identifying sets of genes that may play a role in their interaction with the symbiotes. 

      Wang, He et al sample mussels from 3 different environments: animals from their native methane rich environment, animals transplanted to a methane-poor environment to induce starvation and animals that have been starved in the methane-poor environment and then moved back to the methane-rich environment. They demonstrated that starvation had the biggest impact on bacteriocyte transcriptomes. They hypothesize that the up-regulation of genes associated with lysosomal digestion leads to the digestion of the intracellular symbiont during starvation, while the non-starved and reacclimated groups more readily harvest the nutrients from symbiotes without destroying them. Further work exploring the differences in symbiote populations between ecological conditions will further elucidate the dynamic relationship between host and symbiote. This will help disentangle specific changes in transcriptomic state that are due to their changing interactions with the symbiotes from changes associated with other environmental factors. 

      This paper makes available a high quality dataset that is of interest to many disciplines of biology. The unique qualities of this non-model organism and collection of conditions sampled make it of special interest to those studying deep sea adaptation, the impact of environmental perturbation on Bathymodioline mussels populations, and intracellular symbiotes. The authors also use a diverse array of tools to explore and validate their data. 

      Reviewer #2 (Public Review): 

      Wang, He et al. shed insight into the molecular mechanisms of deep-sea chemosymbiosis at the single-cell level. They do so by producing a comprehensive cell atlas of the gill of Gigantidas platifrons, a chemosymbiotic mussel that dominates the deep-sea ecosystem. They uncover novel cell types and find that the gene expression of bacteriocytes, the symbiont-hosting cells, supports two hypotheses of host-symbiont interactions: the "farming" pathway, where symbionts are directly digested, and the "milking" pathway, where nutrients released by the symbionts are used by the host. They perform an in situ transplantation experiment in the deep sea and reveal transitional changes in gene expression that support a model where starvation stress induces bacteriocytes to "farm" their symbionts, while recovery leads to the restoration of the "farming" and "milking" pathways. 

      A major strength of this study includes the successful application of advanced single nucleus techniques to a non-model, deep sea organism that remains challenging to sample. I also applaud the authors for performing an in situ transplantation experiment in a deep sea environment. From gene expression profiles, the authors deftly provide a rich functional description of G. platifrons cell types that is well-contextualized within the unique biology of chemosymbiosis. These findings offer significant insight into the molecular mechanisms of deep-sea host-symbiont ecology, and will serve as a valuable resource for future studies into the striking biology of G. platifrons. 

      The authors' conclusions are generally well-supported by their results. However, I recognize that the difficulty of obtaining deep-sea specimens may have impacted experimental design and no replicates were sampled. 

      It is notable that the Fanmao cells were much more sparsely sampled. It appears that fewer cells were sequenced, resulting in the Starvation and Reconstitution conditions having 2-3x more cells after doublet filtering. These discrepancies also are reflected in the proportion of cells that survived QC, suggesting a distinction in quality or approach. However, the authors provide clear and sufficient evidence via bootstrapping that batch effects between the three samples are negligible. While batch effect does not appear to have affected gene expression profiles, the proportion of cell types may remain sensitive to sampling techniques, and thus interpretation of Fig. S12 must be approached with caution. 

      Reviewer #3 (Public Review): 

      Wang et al. explored the unique biology of the deep-sea mussel Gigantidas platifrons to understand fundamental principles of animal-symbiont relationships. They used single-nucleus RNA sequencing and validation and visualization of many of the important cellular and molecular players that allow these organisms to survive in the deep-sea. They demonstrate that a diversity of cell types that support the structure and function of the gill including bacteriocytes, specialized epithelial cells that host sulfur-oxidizing or methane-oxidizing symbionts as well as a suite of other cell types including supportive cells, ciliary, and smooth muscle cells. By performing experiments of transplanting mussels from one habitat which is rich in methane to methane-limited environments, the authors showed that starved mussels may consume endosymbionts versus in methane-rich environments upregulated genes involved in glutamate synthesis. These data add to the growing body of literature that organisms control their endosymbionts in response to environmental change. 

      The conclusions of the data are well supported. The authors adapted a technique that would have been technically impossible in their field environment by preserving the tissue and then performing nuclear isolation after the fact. The use of single-nucleus sequencing opens the possibility of new cellular and molecular biology that is not possible to study in the field. Additionally, the in-situ data (both WISH and FISH) are high-quality and easy to interpret. The use of cell-type-specific markers along with a symbiont-specific probe was effective. Finally, the SEM and TEM were used convincingly for specific purposes in the case of showing the cilia that may support water movement. 

      The one particular area for future exploration surrounds the concept of a proliferative progenitor population within the gills. The authors recover molecular markers for these putative populations and additional future work will uncover if these are indeed proliferative cells contribute to symbiont colonization. 

      Overall the significance of this work is identifying the relationship between symbionts and bacteriocytes and how these host bacteriocytes modulate their gene expression in response to environmental change. It will be interesting to see how similar or different these data are across animal phyla. For instance, the work of symbiosis in cnidarians may converge on similar principles of there may be independent ways in which organisms have been able to solve these problems. 

      We extend our sincere gratitude to all the reviewers for their positive comments and kind words. We highly value the substantial efforts they made in helping us improve and enhance our manuscript. Additionally, we appreciate the reviewers for pointing out the limitations of our current study, which will guide us in improving our future researches.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      This study system is so interesting and this is a truly unique and exciting dataset. Most of my suggestions are aimed at improving readability and making it more accessible for a broader audience, since I predict many fields will find it interesting. 

      Line 60: which species of mussel? Is this the same one? 

      We appreciate the comments from the reviewer. The reference here is to deep-sea bathymodiolin mussels, which, in most cases, possess enlarged gill filaments that accommodate symbionts.

      Line 237-230: citation of previous findings missing 

      We appreciate the comments from the reviewer. After carefully reviewing these paragraphs, we believe that all the previous findings have now been properly cited.

      Line 256: it might be a good idea to give a brief description of what slingshot analysis is here 

      We appreciate the comments from the reviewer. We have revise the corresponding part of our manuscript to make it clear.

      This parts of manscript now reads: “We performed Slingshot analysis, which uses a cluster-based minimum spanning tree (MST) and a smoothed principal curve to determine the developmental path of cell clusters. The re-sult shows that the PEBZCs might be the origin of all gill epithelial cells, including the other two proliferation cells (VEPC and DEPC) and bacteriocytes (Supplementary Fig. S6).” Line 203-207 of the revised manscript.

      Line 289: Wording is a bit confusing- what is meant by morphological analysis?

      We acknowledge that our wording might be a bit confusing here. We are referring to the TEM ultrastructural analysis. Therefore, we have changed “morphological analysis” to “ultrastructural analysis.” Line231 in the revised manuscript.

      Line 351-354: how did you calculate distances? How many dimensions were used? 

      We calculated the centroid coordinates for each cell type in each state on the 2-dimensional UMAP plot (Fig. 6A). Then, for each cell type, we determined the Euclidean distance between the centroid coordinates of each pair of states. We have revised the manuscript with this more detailed description. Line 292-295 of revised manuscript.

      Line 462: identify -> identified 

      We apologize for our mistake and appreciate the reviewer’s kind assistance with proofreading. The typo has been corrected in the new version. Line396 of the revised manscript.

      Line 509: what does the size of the dot represent? 

      In this context, the color and intensity of each dot represent a specific gene’s expression level in the single-cell cluster. The dot size is universal and therefore does not convey a specific meaning.

      Fig 3A: What is the blue cluster highlighted? 

      We apologize for our mistake. The label for the teal box was missed. We have corrected our mistake in the revised manuscript.

      Fig 3K: Wording in key is confusing. 

      We have modified our description of Fiugre 3K in the figure legneds. Now it reads: “Schematic of water flow agitated by different ciliary cell types. The color of arrowheads corresponds to water flow potentially influenced by specific types of cilia, as indicated by their color code in Figure 3A.” Line462-464 in the revised manscript.

      Fig 5B: which population of mussels was used to take these images? 

      These mussels from “Fanmao” (methane rich) site were used to take these images. We have revised our material and methods to make it clear. Line602-603 of the revised manuscript.

      Fig 5E,5G,5H: panels not referenced in text 

      We apologize for our mistake and appreciate the reviewer’s thorough reading. This error has been corrected in the new version of the manuscript. Line233 of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors): 

      Minor comments: 

      Fig. 3A - the teal box in the legend lacks a label 

      We apologize for our mistake. The label for the teal box was missed. We have corrected our mistake in the

      Reviewer #3 (Recommendations For The Authors): 

      My enthusiasm for the manuscript remains high and I appreciate the authors care in responding to the various reviewer questions and concerns. 

      In regards to the cell proliferation results, I have modified my public review and look forward to your future work in this area. The data for both pHistone H3 and anti PCNA are compelling! 

      One typo I did catch occurs on line 520. I believe you meant to say "outer" not "otter." 

      We apologize for our mistake and appreciate the reviewer’s kind assistance with proofreading. The typo has been corrected in the new version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Dubicka and co-workers on calcification in miliolid foraminifera presents an interesting piece of work. The study uses confocal and electron microscopy to show that the traditional picture of calcification in porcelaneous foraminifera is incorrect.

      Strengths:

      The authors present high-quality images and an original approach to a relatively solid (so I thought) model of calcification.

      Weaknesses:

      There are several major shortcomings. Despite the interesting subject and the wonderful images, the conclusions of this manuscript are simply not supported at all by the results. The fluorescent images may not have any relation to the process of calcification and should therefore not be part of this manuscript. The SEM images, however, do point to an outdated idea of miliolid calcification. I think the manuscript would be much stronger with the focus on the SEM images and with the speculation of the physiological processes greatly reduced.

      We agree that fluorescence studies presented in the paper are not an unequivocal proof by itself for calcification model utilised by studied Miliolida species. However, fluorescence data combined with SEM studies, especially overlap of the elements that show autofluorescence upon excitation at 405 nm (emission 420–480 nm) and acidic vesicles marked by p_H-_sensitive LysoGlow84, may be a hint indicating ACC-bearing vesicles.

      We will tone down the the physiological interpretation based on fluorescence studies in the revised version of the manuscript.

      Nevertheless, we think that our fluorescent life-imaging experiments provides important observations in miliolida, which is scarce in the existing literature, and therefore are worth being presented as they might be very helpful in better understanding of full calcification model in the future.

      Reviewer #2 (Public Review):

      Summary:

      Dubicka et al. in their paper entitled " Biocalcification in porcelaneous foraminifera" suggest that in contrast to the traditionally claimed two different modes of test calcification by rotallid and porcelaneous miliolid formaminifera, both groups produce calcareous tests via the intravesicular mineral precursors (Mg-rich amorphous calcium carbonate). These precursors are proposed to be supplied by endocytosed seawater and deposited in situ as mesocrystals formed at the site of new wall formation within the organic matrix. The authors did not observe the calcification of the needles within the transported vesicles, which challenges the previous model of miliolid mineralization. Although the authors argue that these two groups of foraminifera utilize the same calcification mechanism, they also suggest that these calcification pathways evolved independently in the Paleozoic.

      We do not argue that Miliolida and Rotallida utilize exactly the same calcification mechanism but the both groups use less divergent crystallization pathways, where mesocrystalline chamber walls are created by accumulating and assembling particles of pre-formed liquid amorphous mineral phase.

      Strengths:<br /> The authors document various unknown aspects of calcification of Pseudolachlanella eburnea and elucidate some poorly explained phenomena (e.g., translucent properties of the freshly formed test) however there are several problematic observations/interpretations which in my opinion should be carefully addressed.

      Weaknesses:

      (1) The authors (line 122) suggest that "characteristic autofluorescence indicates the carbonate content of the vesicles (Fig. S2), which are considered to be Mg-ACCs (amorphous MgCaCO3) (Fig. 2, Movies S4 and S5)". Figure S2 which the authors refer to shows only broken sections of organic sheath at different stages of mineralization. Movie S4 shows that only in a few regions some vesicles exhibit red autofluorescence interpreted as Mg-ACC (S5 is missing but probably the authors were referring to S3). In their previous paper (Dubicka et al 2023: Heliyon), the authors used exactly the same methodology to suggest that these are intracellularly formed Mg-rich amorphous calcium carbonate particles that transform into a stable mineral phase in rotaliid Aphistegina lessonii. However, in Figure 1D (Dubicka et al 2023) the apparently carbonate-loaded vesicles show the same red autofluorescence as the test, whereas in their current paper, no evidence of autofluorescence of Mg-ACC grains accumulated within the "gel-like" organic matrix is given. The S3 and S4 movies show circulation of various fluorescing components, but no initial phase of test formation is observable (numerous mineral grains embedded within the o rganic matrix - Figures 3A and B - should be clearly observed also as autofluorescence of the whole layer). Thus the crucial argument supporting the calcification model (Figure 5) is missing.

      This is correct that we did not observe the initial phase of test formation in vivo. Therefore, it is not our crucial argument supporting novel components of the new calcification model. We suspect that vesicles preparing and transporting Mg-ACC are produced way before their docking and deposition into the new wall, because such seawater vesicles were observed between the chamber formation stages (Goleń and Tyszka, 2024, personal communication based on independent experiments on a closely related miliolid taxon). It means that our in vivo experiments most likely represent a long, dynamic stage of vesicles formation via seawater endocytosis, their modification (incl. Mg-ACC formation) before the stage of exocytosis during the new chamber formation. Our crucial arguments supporting the calcification model come from the SEM imaging of the specimens fixed during chamber formation, as well as from the transparency of the new chamber wall during its progressive calcification.

      There is no support for the following interpretation (lines 199-203) "The existence of intracellular, vesicular intermediate amorphous phase (Mg-ACC pools), which supply successive doses of carbonate material to shell production, was supported by autofluorescence (excitation at 405 nm; Fig. 2; Movies S3 and S4; see Dubicka et al., 2023) and a high content of Ca and Mg quantified from the area of cytoplasm by SEM-EDS analysis (Fig. S6)."

      We used laser line 405nm and multiphoton excitaton to detect ACCs. These wavelengths (partly) permeate the shell to excite ACCs autofluorescence. The autofluorescence of the shells is present as well but not clearly visible in movieS4 as the fluorescence of ACCs is stronger. This may be related to the plane/section of the cell which is shown. The laser permeates the shell above the ACCs (short distance) but to excite the shell CaCO3 around foraminifera in the same three-dimensional section where ACCs are shown, the light must pass a thick CaCO3 area due to the three-dimensional structure of the foraminiferan shell. Therefore, the laser light intensity is reduced. In a revised version a movie/image with reduced threshold is shown.

      Author response image 1.

      Autofluorescence image of studied Miliolida species (exc. 405 nm) showing algal chlorophyll (blue) and CaCO3 (red), both ACC and calcite shell.

      It would be very convenient if it was possible to visualize ACC by illumination with a blacklight, but there are very many organic molecules that have an autofluorescence excited by ~405 nm. One of the examples is NADH (Lee et al., 2015. Kor J Physiol Pharmac 19(4): 373-382), an omnipresent molecule in any cell (couldn't copy the appropriate picture here, but the reference has a figure with the em/exc spectra).

      The paper of Lee et al. 2015 shows that the excitation spectrum of NADH is ending close to 400 nm. This means that NADH is not or only very weakly excitable at 405nm, what we used as the excitation laser line. 

      (2) The authors suggest that "no organic matter was detected between the needles of the porcelain structures (Figures 3E; 3E; S4C, and S5A)". Such a suggestion, which is highly unusual considering that biogenic minerals almost by definition contain various organic components, was made based only on FE-SEM observation. The authors should either provide clearcut evidence of the lack of organic matter (unlikely) or may suggest that intense calcium carbonate precipitation within organic matrix gel ultimately results in a decrease of the amount of the organic phase (but not its complete elimination), alike the pure calcium carbonate crystals are separated from the remaining liquid with impurities ("mother liquor"). On the other hand, if (249-250) "organic matrix involved in the biomineralization of foraminiferal shells may contain collagen-like networks", such "laminar" organization of the organic matrix may partly explain the arrangement of carbonate fibers parallel to the surface as observed in Fig. 3E1.

      We agree with the reviewer that biogenic minerals should by definition contain some organic components. We just wrote that "no organic matter was detected between the needles of the porcelain structures” that means that we did not detect any organic structures based only on our FE-SEM observations. We will rephrase this part of the text to avoid further confusion.

      (3) The author's observations indeed do not show the formation of individual skeletal crystallites within intracellular vesicles, however, do not explain either what is the structure of individual skeletal crystallites and how they are formed. Especially, what are the structures observed in polarized light (and interpreted as calcite crystallites) by De Nooijer et al. 2009? The author's explanation of the process (lines 213-216) is not particularly convincing "we suspect that the OM was removed from the test wall and recycled by the cell itself".

      Thank you for this comment. We will do our best to supplement our explanations. We are aware about the structures observed in polarized light by De Nooijer et al. (2009). However, Goleń et al. (2022, Prostist; + 2 other citations) showed that organic polymers may also exhibit light polarization. Additional experimental studies are needed to separate these types of polarization. We will try to investigate this issue in our future research.

      (4) The following passage (lines 296-304) which deals with the concept of mesocrystals is not supported by the authors' methodology or observations. The authors state that miliolid needles "assembled with calcite nanoparticles, are unique examples of biogenic mesocrystals (see Cölfen and Antonietti, 2005), forming distinct geometric shapes limited by planar crystalline faces" (later in the same passage the authors say that "mesocrystals are common biogenic components in the skeletons of marine organisms" (are they thus unique or are they common)? It is my suggestion to completely eliminate this concept here until various crystallographic details of the miliolid test formation are well documented.

      Our intension was to express that mesocrystals are common biogenic components in the skeletons of marine organisms however such a miliolid needles forming distinct geometric shapes limited by planar crystalline faces are unique.

      Reviewer #1 (Recommendations For The Authors):

      Below, I have summarized my main criticisms.

      (1) The movies S1-S4 do not indicate what is described. The videos show indeed seawater (S1), cell membranes (S2), and autofluorescence and acidic vesicles (S3 and S4). The presence of all these intracellular structures is not surprising: any eukaryotic cell will have those. The authors, however, claim that they participate in the process of calcification, which is simply not shown. One of the main arguments seems the presence of 'carbonate pools', in the caption these are even claimed to be 'Mg-ACC pools', but this is by no means revealed by an excitation of 405nm/ emission between 420 and 490 nm. It would be very convenient if it was possible to visualize ACC by illumination with a blacklight, but there are very many organic molecules that have an autofluorescence excited by ~405 nm. One of the examples is NADH (Lee et al., 2015. Kor J Physiol Pharmac 19(4): 373-382), an omnipresent molecule in any cell (couldn't copy the appropriate picture here, but the reference has a figure with the em/exc spectra).

      The paper of Lee et al. 2015 shows that the excitation spectrum of NADH is ending close to 400 nm. This means that NADH is not or only very weakly excitable at 405nm, what we used as the excitation laser line. 

      The fluorescence by this excitation/ emission couple unlikely indicates the vesicles in which these foraminifera calcify. Therefore, most of the interpretation of the authors on what happens with the calcitic needles is not based on results but remains pure speculation.

      The fluorescence autofluorescence upon excitation at 405 nm (emission 420–480 nm is typical for CaCO3 both for biocalcite and amorphous calcium carbonate, what was proven by laboratory synthesis of amorphous calcium carbonate (Dubicka et al., in preparation).

      (2) The results mention 'granules', which are the supposed Mg-ACC-containing vesicles, but the movies simply don't show any granules. Only fluorescence. Again, the results show a lot of vesicles with autofluorescence, but these are not necessarily related to calcification. Proof could be supplied by showing that the same fluorescent vesicles are 'used up' when the specimens under observation are making a new chamber, but until that is done, the fate of all these vesicles remains uncertain and once more, may not be involved in calcification at all.

      We suspect that vesicles preparing and transporting Mg-ACC are produced way before their docking and deposition into the new wall, because such seawater vesicles were observed between the chamber formation stages (Goleń and Tyszka, 2024, personal communication based on independent experiments on a closely related miliolid taxon). It means that our in vivo experiments most likely represent a long, dynamic stage of vesicles formation via seawater endocytosis, their modification (incl. Mg-ACC formation) before the stage of exocytosis during the new chamber formation. Our crucial arguments supporting the calcification model come from the SEM imaging of the specimens fixed during chamber formation, as well as from the transparency of the new chamber wall during its progressive calcification.

      (3) The Methods are unclear. How long were the foraminifers kept before being placed under the microscope? Were they fed with anything? This is important since the chlorophyll should not be from any food source. I didn't know that this foraminiferal species has photosynthetic symbionts: genera like Quinqueloculina don't. Is there any reference for this? Normally, I wouldn't care that much, but the authors find the presence of (facultative) symbionts important (lines 305-336). I am a bit suspicious about this since the only evidence for the presence of photosynthetic symbionts is because of the autofluorescence. As the authors said, commonly these miliolid species are regarded as symbiont-barren, so additional proof for these symbionts is necessary.

      We agree that additional proof is needed for the presence of photosynthetic symbionts. We rephrased the manuscript accordingly.

      (4) It is also unclear (Methods) at what stage the miliolids were photographed (Figure 3). How did chamber formation proceed, what was the timing of the photographs, etc. These pictures are to me the most interesting finding of this study, but need to be described much better.

      All individuals of living foraminifera were fixed at the overall stage of chamber formation. However, every individual presents a complete set of successive steps (substages) of chamber wall calcification fixed at once. Fig. 3A and B present nearly the most proximal (youngest) part of the new chamber with a thick wall of calcite nanograins within a gel-like organic matrix. Fig. 3C and D present a bit more distal (intermediate) part of the calcified chamber. Fig. 3E shows the most distal part of the new chamber. This part is anchored to the older, underlying solid calcified chamber (not shown in this figure). All these steps are synchronous, however, represent gradual successive stages of calcification. The main text and Figs 4 and 5 explain this phenomenon in details.

      There are many small issues with the text too. These include:

      Line 28/29: in many other groups, calcification is thought to be polyphyletic (e.g. sponges: Chombard et al., 1997. Biol Bull 193: 359-367).

      Corrected

      Line 29/30: there may be even more 'types of shells'. The first author has shown in earlier papers that nodosarids have a unique shell architecture. Spirillinids also seem to have their own way of calcification. It is unclear what is meant here by 'two contrasting models'.

      By now there are known only two models of foraminiferal calcification. Lagenida biocalcification has not been studied.

      Line 33: 'Both groups'? This paper only shows calcification in miliolids.

      However, we refer to previous study.

      Line 42: Perhaps, but there is no data on the pseudopodial network in this manuscript.

      We refer to Angell, 1980 studies

      Line 43: Likely, but that is not what this manuscript is showing.

      Line 42-44: The authors should make a choice and be clear. The point of this paper is that miliolids and rotalids calcify in ways that are actually not as different as they seemed previously. Still, they are said to have different 'chamber formation modes'. If they are calcifying in a similar way (which I think is not necessarily supported by the results), isn't calcification in these groups like variations on the same theme? How does this relate to the independent origins of calcification within these two groups?

      Our intension is to show that Miliolida and Rotaliida utilize less divergent calcification pathways, following the recently discovered biomineralization principles.

      Line 49-51: is this a well-established distinction? If so, please add a reference. If not: what is fundamentally different between B and C? Does only the size of the intracellular vesicle matter?

      Rephrased

      Line 60: please include a reference for the intracellular calcification by coccolithophores.

      Added

      Line 67: this is wrong. It is the alignment of the needles at the surface that makes them all reflect light in the same way and gives the shells a porcelaneous appearance. A close-up of the miliolid's shell surface shows this arrangement. Underneath this layer, the orientation of the needles is more random.

      We referred to Johan Hohenegger papers.

      Line 114: how else?

      Line 114-116: I don't see the relevance here. If seawater is taken up, the vesicle containing this seawater has to have a membrane around it. By definition. The text here ('These vesicles') suggests that Calcein and FM1-43 were combined (which they easily could have), but the methods describe that they are used successively.

      Yes, we used two dyes separately.

      Lines 122-130: I think the interpretation of this autofluorescence signal is wrong. Even if it was true, these lines belong to the Discussion.

      This paragraph has been placed within discussion

      Line 138: What are 'mobile clusters'? I don't see a relation between the location of the symbionts and the other vesicles (Figure 2).

      Line 147-148: How can an SEM image show the absence of organic matter?

      We meant the absence of the gel-like OM visible in the previous stages of the chamber formation

      Line 148: Should be 'Figs. 3E; 3E1; S4C'.

      Corrected

      Lines 143-150: this can be merged with the following paragraph.

      Done

      Lines 151-169: why is there no indication of the time? Figures 3 and 4 link the pictures in time to show the development of the growing chamber wall. However, neither here nor in the methods, is there any recording of the time after the beginning of chamber formation. Now, the images are linked (Figure 4) as if they were taken at regular intervals, but this is not documented.

      Lines 170-184: this should go to the Discussion.

      Done

      Line 193-195: this is likely, but not visible in Figure 1.

      It was visible by optical microscopy and described by Angell, 1980

      Line 199-201: I don't understand this: the fluorescent vesicles were not observed during chamber formation so any link between the SEM and CLSM scans remains pure speculation.

      Line 203-204: needed for what?

      For better documentation of Miliolid ACC-bearing granules

      Line 220: is this shown in any of the images? 

      Angell, 1980

      Line 230: It sounds nice, but I don't think a 'paradigm shift' is appropriate here. However interesting and important foraminiferal biomineralization is, the authors show that the crystals of miliolids are likely formed differently than previously thought. If this is a 'paradigm shift', then most scientific findings are.

      In our opinion this is definitely a shift of paradigm

      Line 231: I don't think anyone suggested miliolids and coccolithophores share 'the same' pathway. They are shown (cocco's) and thought (miliolids) to secrete their calcite intracellularly.

      Changed to similar, intracellular

      Line 258: References should only be to peer-reviewed studies.

      Line 430: Burgers'

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      Please separate clearly the results (observations) from the discussion (interpretations): various interpretational/commentary phrases should be removed from the Results section to Discussion e.g., lines 124-130, 131-135.

      Interpretation have been separated from results as suggested by Reviewer.

      [line 49] " living cells have evolved three major skeleton crystallization pathways". I would rather say "organisms" not "cells" as the coordination of the calcification process in multicellular organisms clearly involves processes that are beyond the individual cell activity.

      Corrected

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Original comment: There is no explanation for how this work could be a breakthrough in simulation gregarious feeding as is stated in the manuscript.

      Reviewer response: I think I understand where the authors are trying to take this next step. If the authors were to follow up on this study with the proposed implementation of inhalant/exhalent velocities profiles (or more preferably velocity/pressure fields), then that study would be a breakthrough in simulating such gregarious feeding. Based on what has been done within the present study, I think the term "breakthrough" is instead overly emphatic. An additional note on this. The authors are correct that incorporating additional models could be used to simulation a population (as has been successfully done for several Ediacaran taxa despite computational limitations), but it's not the only way. The authors 1 might explore using periodic boundary conditions on the external faces of the flow domain. This could require only a single Olivooid model to assess gregarious impacts - see the abundant literature of modeling flow through solar array fields.

      We appreciate the reviewer 1 for the suggestion. Modeling gregarious feeding via periodic boundary conditions is surely a practical way with limited computational resources. Modeling flow through solar array fields can also be an inspiring case. However, to realism the simulation of gregarious feeding behavior on an uneven seabed and with irregular organism spatial distribution, just using periodic boundary conditions may not be sufficient (see Author response image 1 for a simple example). We will go on exploring the way of realizing the simulations of large-scale gregarious feeding.

      Author response image 1.

      An example of modeling gregarious feeding behavior on an uneven seabed.

      Original comment: The claim that olivooid-type feeding was most likely a prerequisite transitional form to jet-propelled swimming needs much more support or needs to be tailored to olivooids. This suggests that such behavior is absent (or must be convergent) before olivooids, which is at odds with the increasing quantities of pelagic life (whose modes of swimming are admittedly unconstrained) documented from Cambrian and Neoproterozoic deposits. Even among just medusozoans, ancestral 1 state reconstruction suggests that they would have been swimming during the Neoproterozoic (Kayal et al., 2018; BMC Evolutionary Biology) with no knowledge of the mechanics due to absent preservation. Author response: Thanks for your suggestions. Yes, we agree with you that the ancestral swimming medusae may appear before the early Cambrian, even at the Neoproterozoic deposits. However, discussions on the affinities of Ediacaran cnidarians are severely limited because of the lack of information concerning their soft anatomy. So, it is hard to detect the mechanics due to absent preservation. Olivooids found from the basal Cambrian Kuanchuanpu Formation can be reasonably considered as cnidarians based on their radial symmetry, external features, and especially the internal anatomies (Bengtson and Yue 1997; Dong et al. 2013; 2016; Han et al. 2013; 2016; Liu et al. 2014; Wang et al. 2017; 2020; 2022). The valid simulation experiment here was based on the soft tissue preserved in olivooids.

      Reviewer response: This response does not sufficiently address my earlier comment. While the authors are correct that individual Ediacaran affinities are an area of active research and that Olivooids can reasonably be considered cnidarians, this doesn't address the actual critique in my comment. Most (not all) Ediacaran soft-bodied fossils are considered to have been benthic, but pelagic cnidarian life is widely acknowledged to at least be present during later White Sea and Nama assemblages (and earlier depending on molecular clock interpretations). The authors have certainly provided support for the mechanics of this type of feeding being co-opted for eventual jet propulsion swimming in Olivooids. They have not provided sufficient justifications within the manuscript for this to be broadened beyond this group.

      Thanks for your sincere commentary. We of course agree with the possibility of the emergence of swimming cnidarians before the lowermost Cambrian Fortunian Stage. See lines 16-129: “Ediacaran fossil assemblages with complex ecosystems consist of exceptionally preserved soft-bodied eukaryotes of enigmatic morphology, which their affinities are mostly unresolved (Tarhan et al., 2018, Integrative and Comparative Biology, 58 (4), 688–702; Evans et al., 2022, PNAS, 11(46), e220747511).” Undoubtedly Olivooids belong to cnidarians charactered by their external and internal biological structures. Limited by the fossil records, we could only speculate on the transition from the benthic to the swimming of ancestral cnidarians via the valid fossil preservation, e.g. olivooids. The transition may require processes such as increasing body size, thickening the mesoglea, and degenerating the periderm, etc. And these processes may also evolve independently or comprehensively. Moreover, the ecological behaviors of the ancestral cnidarians may evolve independently at different stages from Ediacaran to Cambrian. We therefore could not provide more sufficient justifications beyond olivooids.

      Original comment: L446: two layers of hexahedral elements is a very low number for meshing boundary layer flow

      Reviewer response: As the authors point out in the main text, these organisms are small (millimeters in scale) and certainly lived within the boundary layer range of the ocean. While the boundary layer is not the main point, it still needs to be accurately resolved as it should certainly affect the flow further towards the far field at this scale. I'm not suggesting the authors need to perfectly resolve the boundary layer or focus on using turbulence models more tailored to boundary layer flows (such as k-w), but the flow field still needs sufficient realism for a boundary bounded flow. The authors really should consider quantitatively assessing the number of hexahedral elements within their mesh refinement study.

      To address this concern, we run another four simulations based on mesh4 within our mesh refinement study to assess the number of hexahedral elements (five layers and eight layers of hexahedral elements with different thickness of boundary layer mesh (controlled by thickness adjustment factor), respectively). the results had been supplemented to Table supplement 2. As shown in the results, the number of layers of hexahedral elements seems does not significant influence the result, but the thickness of boundary layer mesh can influence the maximum flow velocity of the contraction phase. However, the results of all the simulations were generally consistent, as shown in Author response image 2. The description of the results above were added to section “Mesh sensitivity analysis”.

      Author response image 2.

      Results of mesh refinement study of different boundary layer mesh parameters.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The points raised let us critically rethink our approach, our results, and our conclusions. Furthermore, it gave us the chance to elaborate on some critical aspects that were mentioned. With the help of the reviewers, we made some clarifications in the point-by-point responses and implemented them in the manuscript. Furthermore, we modified the figures as suggested:

      - The colors in Figure 1C, D, G and H have been adapted as suggested

      - We added a Figure2-figure supplement 1, which strengthens our conclusion in Figure 2

      - As asked by reviewer #1 (weaknesses #3), we added the data about neutrophil numbers in the different organs (Figure 6-figure supplement 3C).

      Reviewer #1 (Public Review):

      Summary:

      - Extracellular ATP represents a danger-associated molecular pattern associated to tissue damage and can act also in an autocrine fashion in macrophages to promote proinflammatory responses, as observed in a previous paper by the authors in abdominal sepsis. The present study addresses an important aspect possibly conditioning the outcome of sepsis that is the release of ATP by bacteria. The authors show that sepsis-associated bacteria do in fact release ATP in a growth dependent and strain-specific manner. However, whether this bacterial derived ATP play a role in the pathogenesis of abdominal sepsis has not been determined. To address this question, a number of mutant strains of E. coli has been used first to correlate bacterial ATP release with growth and then, with outer membrane integrity and bacterial death. By using E. coli transformants expressing the ATP-degrading enzyme apyrase in the periplasmic space, the paper nicely shows that abdominal sepsis by these transformants results in significantly improved survival. This effect was associated with a reduction of peritoneal macrophages and CX3CR1+ monocytes, and an increase in neutrophils. To extrapolate the function of bacterial ATP from the systemic response to microorganisms, the authors exploited bacterial OMVs either loaded or not with ATP to investigate the systemic effects devoid of living microorganisms. This approach showed that ATP-loaded OMVs induced degranulation of neutrophils after lysosomal uptake, suggesting that this mechanism could contribute to sepsis severity.

      Strengths:

      - A strong part of the study is the analysis of E. coli mutants to address different aspects of bacterial release of ATP that could be relevant during systemic dissemination of bacteria in the host.

      We want to thank the reviewer for recognizing this important aspect of our experimental approach.

      Weaknesses:

      - As pointed out in the limitations of the study whether ATP-loaded OMVs provide a mechanistic proof of the pathogenetic role of bacteria-derived ATP independently of live microorganisms in sepsis is interesting but not definitively convincing. It could be useful to see whether degranulation of neutrophils is differentially induced by apyrase-expressing vs control E. coli transformants.

      We thank the reviewer for raising several important points. In our study, we assessed local and systemic effects of released bacterial ATP. The consequences of local bacterial ATP release were assessed using an apyrase-expressing E. coli transformant. Locally, bacterial ATP resulted in a decrease in neutrophil numbers and we hypothesize that directly released bacterial ATP either leads to neutrophil death (e.g. via P2X7 receptor (Proietti et al., 2019)) or interferes with the recruitment of neutrophils (e.g. via P2Y receptors (Junger, 2011)).

      The systemic consequences were assessed using ATP-loaded and empty OMV. We have shown that degranulation is induced by OMV-derived bacterial ATP. ATP-containing OMV are engulfed by neutrophils, reach its endolysosomal compartment and might activate purinergic receptors, which then lead to aberrant degranulation. This concept, that needs to be explored in future studies, is fundamentally different from classical purinergic signaling via directly released bacterial ATP into the extracellular space.

      It is possible that neutrophil degranulation is also modulated by directly released bacterial ATP. We agree that this should be assessed in future studies. Also, the role of OMV-derived bacterial ATP should be assessed locally as well as the importance of directly released vs. OMV-mediated bacterial ATP dissected locally. Based on our measurements (Figure 4-figure supplement 1A and Figure 5C), we estimate that the effect of OMV-derived bacterial ATP might be much smaller than the effects of directly released bacterial ATP. Thus, direct ATP release might predominate locally. However, we fully agree that this has to be investigated in a future study to reconcile the different aspects of bacterial ATP signaling. A paragraph will be added to the manuscript, in which we discuss this particular issue.

      - Also, the increase of neutrophils in bacterial ATP-depleted abdominal sepsis, which has better outcomes than "ATP-proficient" sepsis, seems difficult to correlate to the hypothesized tissue damage induced by ATP delivered via non-infectious OMVs.

      We fully acknowledge the mentioned discrepancy. What we propose is that bacterial ATP exhibits different functions that are dependent on the release mechanism (see above). Locally, in the peritoneal cavity, neutrophil numbers are decreased by directly released bacterial ATP. Remotely, ATP is delivered via OMV and impacts on neutrophil function. We agree that, in particular, in the peritoneal cavity, both effects may play a role. However, the impact of directly released bacterial ATP seems to be dominant (see above).

      We propose that neutrophils are decreased locally because of directly released bacterial ATP, which prevents efficient infection control and, therefore, impairs sepsis survival. In addition, these fewer neutrophils might even be dysregulated by the engulfment of bacterial ATP delivered via OMV, which leads to an upregulated and possibly aberrant degranulation process worsening local and remote tissue damage. We agree that in addition to neutrophil numbers, the function of local neutrophils should be assessed with and without the influence of OMV-delivered bacterial ATP. This could be done by RNA sequencing of primary neutrophils from the peritoneal cavity or neutrophil cell lines as well as degranulation assays.

      - Are the neutrophils counts affected by ATP delivered via OMVs?

      This is difficult to show in the peritoneal cavity where we have both, directly released bacterial ATP and OMV-derived bacterial ATP. We assessed such putative difference, however, for the systemic organs and the blood, where we did not find any differences in neutrophil numbers.

      Author response image 1.

      - A comparison of cytokine profiles in the abdominal fluids of E. coli and OMV treated animals could be helpful in defining the different responses induced by OMV-delivered vs bacterial-released ATP. The analyses performed on OMV treated versus E. coli infected mice are not closely related and difficult to combine when trying to draw a hypothesis for bacterial ATP in sepsis.

      We fully agree that there are several open questions that remain to be elucidated, in particular, to differentiate the local role of directly released versus OMV-delivered bacterial ATP. In this study, we laid the foundation for future in vivo research to examine the specific role of bacterial ATP in sepsis. Such future research avenues might be to investigate the local effects of OMV-delivered bacterial ATP, and how neutrophil migration, apoptosis and degranulation are altered. We agree that exploration of the local secretory immune response and cytokine profiles are relevant to understand the different mechanisms of how bacterial ATP alters sepsis. However, such experiments should be ideally performed in systems where the source and the delivery of ATP can be modulated locally.

      - Also it was not clear why lung neutrophils were used for the RNAseq data generation and analysis.

      Thank you for this remark. We have chosen primary lung neutrophils for four reasons:

      (1) Isolation of primary lung neutrophils allowed us to assess an in vivo response that would not have been possible with cell lines.

      (2) The lung and the respiratory system are among the clinically most important organs affected during sepsis resulting in a significant cause of mortality.

      (3) We show in Figure 6C that specifically in the lung, OMV are engulfed by neutrophils, which shows the relevance of the lung also in our study context.

      (4) And finally, lung neutrophils were chosen to examine specifically distant and not local effects.

      Reviewer #2 (Public Review):

      Summary:

      - In their manuscript "Released Bacterial ATP Shapes Local and Systemic Inflammation during Abdominal Sepsis", Daniel Spari et al. explored the dual role of ATP in exacerbating sepsis, revealing that ATP from both host and bacteria significantly impacts immune responses and disease progression.

      Strengths:

      - The study meticulously examines the complex relationship between ATP release and bacterial growth, membrane integrity, and how bacterial ATP potentially dampens inflammatory responses, thereby impairing survival in sepsis models. Additionally, this compelling paper implies a concept that bacterial OMVs act as vehicles for the systemic distribution of ATP, influencing neutrophil activity and exacerbating sepsis severity.

      We thank the reviewer for mentioning these key points and supporting the relevance of our study.

      Weaknesses:

      (1) The researchers extracted and cultivated abdominal fluid on LB agar plates, then randomly picked 25 colonies for analysis. However, they did not conduct 16S rRNA gene amplicon sequencing on the fluid itself. It is worth noting that the bacterial species present may vary depending on the individual patients. It would be beneficial if the authors could specify whether they've verified the existence of unculturable species capable of secreting high levels of Extracellular ATP.

      Most septic complications are caused by a limited spectrum of bacteria, belonging mainly either to the Firmicutes or the Proteobacteria phyla, including E. coli, K. pneumoniae, S. aureus or E. faecalis (Diekema et al., 2019; Mureșan et al., 2018). We validated this well documented existing evidence by randomly assessing 25 colonies. For the planned experiments, it was crucial to work with culturable bacteria; otherwise, ATP measurements, the modulation of ATP generation or loading of OMV would not have been possible. Using such culturable bacteria allowed us to describe mechanisms of ATP release.

      We fully agree that hard-to-culture or unculturable bacteria might contribute significantly to septic complications. This, however, would need to be explored in future studies using extensive culturing methods (Cheng et al., 2022).

      (2) Do mice lacking commensal bacteria show a lack of extracellular ATP following cecal ligation puncture?

      ATP is typically secreted by many cells of the host in active and passive manners in the case of any injury, including cecal ligation and puncture (Burnstock, 2016; Dosch et al., 2018; Eltzschig et al., 2012; Idzko et al., 2014). We hypothesize that bacterial ATP is a potential priming agent at early stages of sepsis, and indeed, at such early time points, a comparison of peritoneal ATP levels between germfree and colonized mice could support our hypothesis. Future studies addressing this question must, however, correct for the different immune responses between germ-free and colonized mice. This is of utmost importance, especially for the cecal ligation and puncture model, since the cecum of germ-free mice is extremely large, making such experiments hard to control.

      (3) The authors isolated various bacteria from abdominal fluid, encompassing both Gram-negative and Gram-positive types. Nevertheless, their emphasis appeared to be primarily on the Gram-negative E. coli. It would be beneficial to ascertain whether the mechanisms of Extracellular ATP release differ between Gram-positive and Gram-negative bacteria. This is particularly relevant given that the Gram-positive bacterium E. faecalis, also isolated from the abdominal fluid, is recognized for its propensity to release substantial amounts of Extracellular ATP.

      We fully agree with this comment. In this paper, we used E. coli as our model organism to determine the principles of sepsis-associated bacterial ATP release and therefore focused on gram-negative bacteria. In addition to the direct, growth-dependent release, we found a relevant impact of OMV-delivered bacterial ATP. For this latter purpose, a gram-negative strain, in which OMV generation has been well described (Schwechheimer & Kuehn, 2015), was chosen. Recently, gram-positive bacteria have been shown to secrete ATP and OMV as well (Briaud & Carroll, 2020; Hironaka et al., 2013; Iwase et al., 2010). Given the fundamental differences in the structure of the cell wall of gram-positive bacteria and the mechanisms of OMV generation and release, future studies are required to assess the relevance of directly released and OMV-delivered ATP in gram-positive bacteria.

      (4) The authors observed changes in the levels of LPM, SPM, and neutrophils in vivo. However, it remains uncertain whether the proliferation or migration of these cells is modulated or inhibited by ATP receptors like P2Y receptors. This aspect requires further investigation to establish a convincing connection.

      We fully agree with this comment. The decrease in LPM and the consequential predomination of SPM have been well described after inflammatory stimuli in the context of the macrophage disappearance reaction (Ghosn et al., 2010). Also, it has been shown that purinergic signaling modulates infiltration of neutrophils and can lead to cell death as a consequence of  P2Y and P2X receptor activation (Junger, 2011; Proietti et al., 2019). In our study, we propose that intracellular purinergic receptors contribute to neutrophil function during sepsis. After introducing the general principles and fundaments of bacterial ATP with our studies, we fully agree that additional experiments need to address downstream purinergic receptor activation. That, however, would go beyond the scope of our study.

      (5) Additionally, is it possible that the observed in vivo changes could be triggered by bacterial components other than Extracellular ATP? In this research field, a comprehensive collection of inhibitors is available, so it is desirable to utilize them to demonstrate clearer results.

      This question is of utmost importance and defined the choice of our model and experimental approach. When we started the project, we used two different E. coli mutants that release low (ompC) and high (eaeH) amounts of ATP. However, the limitation of this approach is that these are different bacteria, which may also differ in the components they secrete or the surface proteins they express. We, therefore, decided against that approach. With the approach we finally used (same bacterium, just with and without ATP), we aimed to minimize the influence of non-ATP bacterial components.

      (6) Have the authors considered the role of host-derived Extracellular ATP in the context of inflammation?

      Yes, the role of host-derived extracellular ATP in inflammation and sepsis is well-established with contradictory results (Csóka et al., 2015; Ledderose et al., 2016). This conflicting data was the rationale to test the relevance of bacterial ATP. We suggest that bacterial ATP is essential in the early phase of sepsis when bacteria invade the sterile compartment and before efficient host response, including the eukaryotic release of ATP, is established.

      (7) The authors mention that Extracellular ATP is rapidly hydrolyzed by ectonucleotases in vivo. Are the changes of immune cells within the peritoneal cavity caused by Extracellular ATP released from bacterial death or by OMVs?

      This is a relevant question that was also asked by reviewer #1, and we answered it in detail above (weaknesses comment #1 and #2). From our ATP measurements (Figure 4-figure supplement 1A and Figure 5C), we conclude that locally, the role of directly released bacterial ATP (extracellular) predominates over OMV-derived bacterial ATP. Furthermore, the mechanisms between directly released and OMV-derived bacterial ATP (within OMV, engulfed and transported to the endolysosomal compartment) are different, and especially extracellular ATP has been described to lead to apoptosis via P2X7 signaling.

      (8) In the manuscript, the sample size (n) for the data consistently remains at 2. I would suggest expanding the sample size to enhance the robustness and rigor of the results.

      Two biological replicates (independent cultures) were only used for the bacteria cultures in Figure 1, Figure 2, and Figure 3, which achieved similar results and the standard deviation remained very small, indicating its robustness. In the in vitro experiments in Figure 5 we used a sample size of 6 (three biological replicates measured in technical duplicates), since we saw bigger deviations in our measurements. For the in vivo experiments, we always used 5 or more animals in at least two independent experiments.

      Reviewer #2 (Recommendations For The Authors):

      (9). Line 37: 11 million sepsis-related deaths were reported "in" 2017.

      The passage has been corrected as suggested.

      (10) By the way, the similar colors used in Figure 1C and G are too chaotic, making it difficult to distinguish.

      We agree, the colors have been adapted.

      Author response image 2.

      (11). All "in vivo" and "in vitro" should be italicized.

      We italicized all of them.

      (12). The title of Figure 4 is confusing: "Impairs sepsis outcome in vivo?" Could you make it more specific?

      We agree, the title has been rephrased:

      “Bacterial ATP reduces neutrophil counts and reduces survival in a mouse model of abdominal sepsis.”

      (13) Line 314-316: The sentence "Potentially, despite the lack of a transporter, ATP may similarly to eukaryotic cells leak (Yegutkin et al., 2006) across the inner membrane into the periplasmic space that lacks the enzymes for ATP generation." sounds odd.

      This passage was reformulated in the manuscript.

      “Despite the lack of a transporter, ATP may leak across the inner membrane into the periplasmic space. Such leakage may be similar to baseline leakage in eukaryotic cells (Yegutkin et al., 2006).”

      (14) The numerical notation in the paper is odd: sometimes it uses a prime symbol as a superscript (such as line 504), and sometimes it does not (such as line 421). Should it be standardized to "3,200" and "150,000"?

      Thank you for this remark. The numbers have been standardized throughout the manuscript.

      (15) Line "0.4 mm EP cuvettes" should be "0.4 cm EP cuvettes"

      The specified passage has been corrected as suggested.

      References

      Briaud, P., & Carroll, R. K. (2020). Extracellular Vesicle Biogenesis and Functions in Gram-Positive Bacteria. Infection and Immunity, 88(12), 10.1128/iai.00433-20. https://doi.org/10.1128/iai.00433-20

      Burnstock, G. (2016). P2X ion channel receptors and inflammation. Purinergic Signalling, 12(1), 59–67. https://doi.org/10.1007/s11302-015-9493-0

      Cheng, A. G., Ho, P.-Y., Aranda-Díaz, A., Jain, S., Yu, F. B., Meng, X., Wang, M., Iakiviak, M., Nagashima, K., Zhao, A., Murugkar, P., Patil, A., Atabakhsh, K., Weakley, A., Yan, J., Brumbaugh, A. R., Higginbottom, S., Dimas, A., Shiver, A. L., … Fischbach, M. A. (2022). Design, construction, and in vivo augmentation of a complex gut microbiome. Cell, 185(19), 3617-3636.e19. https://doi.org/10.1016/j.cell.2022.08.003

      Csóka, B., Németh, Z. H., Törő, G., Idzko, M., Zech, A., Koscsó, B., Spolarics, Z., Antonioli, L., Cseri, K., Erdélyi, K., Pacher, P., & Haskó, G. (2015). Extracellular ATP protects against sepsis through macrophage P2X7 purinergic receptors by enhancing intracellular bacterial killing. The FASEB Journal, 29(9), 3626–3637. https://doi.org/10.1096/fj.15-272450

      Diekema, D. J., Hsueh, P.-R., Mendes, R. E., Pfaller, M. A., Rolston, K. V., Sader, H. S., & Jones, R. N. (2019). The Microbiology of Bloodstream Infection: 20-Year Trends from the SENTRY Antimicrobial Surveillance Program. Antimicrobial Agents and Chemotherapy, 63(7), e00355-19. https://doi.org/10.1128/AAC.00355-19

      Dosch, M., Gerber, J., Jebbawi, F., & Beldi, G. (2018). Mechanisms of ATP Release by Inflammatory Cells. International Journal of Molecular Sciences, 19(4), 1222. https://doi.org/10.3390/ijms19041222

      Eltzschig, H. K., Sitkovsky, M. V., & Robson, S. C. (2012). Purinergic Signaling during Inflammation. New England Journal of Medicine, 367(24), 2322–2333. https://doi.org/10.1056/NEJMra1205750

      Ghosn, E. E. B., Cassado, A. A., Govoni, G. R., Fukuhara, T., Yang, Y., Monack, D. M., Bortoluci, K. R., Almeida, S. R., Herzenberg, L. A., & Herzenberg, L. A. (2010). Two physically, functionally, and developmentally distinct peritoneal macrophage subsets. Proceedings of the National Academy of Sciences, 107(6), 2568–2573. https://doi.org/10.1073/pnas.0915000107

      Hironaka, I., Iwase, T., Sugimoto, S., Okuda, K., Tajima, A., Yanaga, K., & Mizunoe, Y. (2013). Glucose Triggers ATP Secretion from Bacteria in a Growth-Phase-Dependent Manner. Applied and Environmental Microbiology, 79(7), 2328–2335. https://doi.org/10.1128/AEM.03871-12

      Idzko, M., Ferrari, D., & Eltzschig, H. K. (2014). Nucleotide signalling during inflammation. Nature, 509(7500), 310–317. https://doi.org/10.1038/nature13085

      Iwase, T., Shinji, H., Tajima, A., Sato, F., Tamura, T., Iwamoto, T., Yoneda, M., & Mizunoe, Y. (2010). Isolation and Identification of ATP-Secreting Bacteria from Mice and Humans. Journal of Clinical Microbiology, 48(5), 1949–1951. https://doi.org/10.1128/JCM.01941-09

      Junger, W. G. (2011). Immune cell regulation by autocrine purinergic signalling. Nature Reviews Immunology, 11(3), 201–212. https://doi.org/10.1038/nri2938

      Ledderose, C., Bao, Y., Kondo, Y., Fakhari, M., Slubowski, C., Zhang, J., & Junger, W. G. (2016). Purinergic Signaling and the Immune Response in Sepsis: A Review. Clinical Therapeutics, 38(5), 1054–1065. https://doi.org/10.1016/j.clinthera.2016.04.002

      Mureșan, M. G., Balmoș, I. A., Badea, I., & Santini, A. (2018). Abdominal Sepsis: An Update. The Journal of Critical Care Medicine, 4(4), 120–125. https://doi.org/10.2478/jccm-2018-0023

      Proietti, M., Perruzza, L., Scribano, D., Pellegrini, G., D’Antuono, R., Strati, F., Raffaelli, M., Gonzalez, S. F., Thelen, M., Hardt, W.-D., Slack, E., Nicoletti, M., & Grassi, F. (2019). ATP released by intestinal bacteria limits the generation of protective IgA against enteropathogens. Nature Communications, 10(1), Article 1. https://doi.org/10.1038/s41467-018-08156-z

      Schwechheimer, C., & Kuehn, M. J. (2015). Outer-membrane vesicles from Gram-negative bacteria: Biogenesis and functions. Nature Reviews Microbiology, 13(10), 605–619. https://doi.org/10.1038/nrmicro3525

    1. Author response:

      Reviewer #1 (Public Review):

      This study excellently complements the previous one by unveiling the properties of NPRL2 in augmenting the effect of immune checkpoint inhibitors such as pembrolizumab in KRAS mutant lung cancer models.

      The following points should be clarified:

      (1) In KRAS mutant cell lines with LKB1 co-mutations or deletions, such as A549 cells, does treatment with NPRL2 not increase the efficacy of immunotherapy? Is this correct? Similarly, does the delivery of NPRL2 only potentiate the effect of immunotherapy in KRAS mutant cell lines without associated LKB1 mutations?

      NPRL2, when used as a single-agent immunotherapy, induces robust antitumor activity in immunotherapy-resistant (aPD1R) KRAS mutant models, such as A549 tumors (KRASmt/LKB1mt/aPD1R) and LLC2 (KRASmt/aPD1R), where immunotherapy is ineffective regardless of LKB1 co-mutation or deletion status. The antitumor effect of NPRL2 combined with aPD1 immunotherapy was not significantly different from NPRL2 alone in immunotherapy-resistant models but was significantly greater than immunotherapy alone. However, a synergistic antitumor effect was observed with NPRL2 and aPD1 immunotherapy in KRAS wild-type and immunotherapy-moderately-responsive models, such as H1299 (KRASwt/aPD1S).

      (2) Do the authors analyze by western blot if NPRL2 influences or restores STING and LKB1 in the A549 cell line that lacks LKB1 and STING?

      NPRL2 induces antitumor immunity on Kras mutant, aPD1 resistant models regardless of LKB1 co-mutations or deletions, however, it would be interesting to look into the effect of NPRL2 on the STING pathway in this LKB1 deleted A549 cell line.

      (3) Mechanistically, is there any explanation as to why NPRL2 delivery increases the efficacy of immunotherapy? Is there any effect on FUS or MYC?

      NPRL2 is a multifunctional tumor suppressor gene that is downregulated or absent in many cancers. NPRL2 has been shown to induce apoptosis, inhibit cell proliferation, and cause cell cycle arrest in various cancer types. Compelling evidence highlights the critical role of NPRL2 in causing DNA damage and double-strand breaks, which can trigger dendritic cell (DC) activation, antigen presentation, and priming of tumor-specific CD8+ T cells in the tumor microenvironment (TME). Our data indicate that NPRL2 treatment is associated with the induction of DC activation and maturation.

      The cellular mechanism of NPRL2 suggests that NPRL2-mediated antitumor immunity depends on the presence of CD4+ T cells, CD8+ T cells, and macrophages. Interestingly, the expression of FUS1, another tumor suppressor gene, was mostly absent or severely downregulated in most non-small cell lung cancers (NSCLC) and was unaffected by NPRL2 treatment. While MYC expression was not assessed in this study, it remains an area of interest for future research.

      (4) Is there any way to carry out a clinical study of systematically delivering NPRL2 in KRAS lung cancer patients?

      In this preclinical study, a clinical-grade DOTAP-NPRL2 formulation was prepared, utilizing NPRL2 encapsulated within nanovesicles for delivery. Based on the promising preclinical data, a phase I clinical trial will be initiated to evaluate the safety and efficacy of this formulation.

      Reviewer #2 (Public Review):

      Summary:

      NPRL2 gene therapy induces effective antitumor immunity in KRAS/STK11 mutant anti-PD1 resistant metastatic non-small cell lung cancer (NSCLC) in a humanized mouse model by Meraz et al investigated the antitumor immune responses to NPRL2 gene therapy in aPD1R / KRAS/STK11mt NSCLC in a humanized mouse model, and found that NPRL2 gene therapy induces antitumor activity on KRAS/STK11mt/aPD1R tumors through DC-mediated antigen presentation and cytotoxic immune cell activation.

      Strengths:

      The novelty of the study.

      Weaknesses:

      (1) The inconsistent effect of NPRL2 combined with pembrolizumab. Figure 2I-K, showed a similar tumor intensity in the NPRL2 group and combination group. However, NPRL2 combined with pembrolizumab was synergistic in the KRASwt/aPD1S H1299 tumors in Figure 4.

      NPRL2, as a single agent immunogen therapy, induces robust antitumor activity on both immunotherapy-resistant (aPD1R) KRAS mutant models, such as A549 tumors (KRASmt/LKB1mt/aPD1R) and LLC2 (KRASmt/aPD1R) and immunotherapy sensitive model such as H1299 (KRASwt/aPD1S) where immunotherapy was ineffective or limitedly effective. A synergistic antitumor effect of NPRL2 and Pembrolizumab combination was found only in immunotherapy moderately responsive models, not in immunotherapy resistant models where PD-1/PD-L1 signaling is impaired shown in Figure 1A.

      (2) The authors stated that NPRL2 combined with pembrolizumab was not synergistic in the KRAS/STK11mt/aPD1R tumors but was synergistic in the KRASwt/aPD1S H1299 tumors. How did the synergistic effect defined in the study, more details need to be provided here.

      Our biostatistician used generalized linear regression models to study the tumor growth over time. Two-way ANOVA with the interaction of treatment group and time point was performed to compare the difference of tumor intensity changes from baseline between each pair of the treatment groups at each time point. The nonparametric Mann-Whitney U test was applied to compare significance in different treatment groups. Differences of P < 0.05, P < 0.01, and P < 0.001 were considered statistically significant. When the combination antitumor effect of NPRL2 and pembrolizumab was found to be statistically significant compared to both single-agent effects synergy was confirmed using the method of Huang et al.

      Huang L, Wang J, Fang B, Meric-Bernstam F, Roth JA, Ha MJ. CombPDX: a unified statistical framework for evaluating drug synergism in patient-derived xenografts. Sci Rep 12(1):12984, 7/2022. e-Pub 7/2022. PMCID: PMC9338066.

      (3) Nearly all of the work was performed pre-clinically. Validation in the clinical setting would provide more strong evidence for the conclusion.

      In this preclinical study, a clinical-grade DOTAP-NPRL2 formulation was prepared, utilizing NPRL2 encapsulated within nanovesicles for delivery. Based on the promising preclinical data, a phase I clinical trial will be initiated to evaluate the safety and efficacy of this formulation.

      (4) Figure 5 and Figure 6 have the same legend. These 2 figures could be merged as a new one.

      Agreed.

      (5) Figure 5B & C, n=9 in the Figure 5B. However, the detail number in Figure 5C was less than 9.

      At least n=7-9 mice/group are shown in the figure 5C. We will revise accordingly.

      Reviewer #3 (Public Review):

      Summary:

      NPRL2/TUSC4 is a tumor suppressor gene whose expression is reduced in many cancers including NSCLC. This study presents a novel finding on NPRL2 gene therapy, which induces antitumor activity on aPD1-resistant tumors. Since KRAS/STK11 mutant tumors were reported to be less benefited from ICIs, this study has potential clinical application value.

      Strengths:

      This work uncovers the advantage of NPRL2 gene therapy by using humanized models and multiple cell lines. Moreover, via immune cell depletion studies, the mechanism of NPRL2 gene therapy has focused on dendritic cells and CD8+T cells.

      Weaknesses:

      A major concern would be the lack of systematic, and logical rigor. This work did not present a link between apoptosis and antigen presenting induced by NPRL2 restoration. There is no evidence proving that the PI3K/AKT/mTOR signaling pathway is related to antigen presenting, which is the major reason of NPRL2 induced antitumor response. Therefore, the two parts may not support each other logically.

      Thank you for your review and comments. We agree that future studies are necessary to establish a direct link between apoptosis and antigen presentation induced by NPRL2 restoration, as well as NPRL2-mediated downregulation of PI3K/AKT/mTOR signaling and its direct effect on antigen presentation. Although NPRL2 restoration directly induced apoptosis in several cell lines shown in Figure 1C and Figure 8Q and significantly increased the number of antigen-presenting DC cells in the tumor microenvironment upon NPRL2 treatment or NPRL2 restoration. Similarly, NPRL2 restoration downregulated the PI3K/AKT/mTOR pathway, which was associated with increased antitumor immunity.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Gating of Kv10 channels is unique because it involves coupling between non-domain swapped voltage sensing domains, a domain-swapped cytoplasmic ring assembly formed by the N- and C-termini, and the pore domain. Recent structural data suggests that activation of the voltage sensing domain relieves a steric hindrance to pore opening, but the contribution of the cytoplasmic domain to gating is still not well understood. This aspect is of particular importance because proteins like calmodulin interact with the cytoplasmic domain to regulate channel activity. The effects of calmodulin (CaM) in WT and mutant channels with disrupted cytoplasmic gating ring assemblies are contradictory, resulting in inhibition or activation, respectively. The underlying mechanism for these discrepancies is not understood. In the present manuscript, Reham Abdelaziz and collaborators use electrophysiology, biochemistry and mathematical modeling to describe how mutations and deletions that disrupt inter-subunit interactions at the cytoplasmic gating ring assembly affect Kv10.1 channel gating and modulation by CaM. In the revised manuscript, additional information is provided to allow readers to identify within the Kv10.1 channel structure the location of E600R, one of the key channel mutants analyzed in this study. However, the mechanistic role of the cytoplasmic domains that this study focuses on, as well as the location of the ΔPASCap deletion and other perturbations investigated in the study remain difficult to visualize without additional graphical information. This can make it challenging for readers to connect the findings presented in the study with a structural mechanism of channel function.

      The authors focused mainly on two structural perturbations that disrupt interactions within the cytoplasmic domain, the E600R mutant and the ΔPASCap deletion. By expressing mutants in oocytes and recording currents using Two Electrode Voltage-Clamp (TEV), it is found that both ΔPASCap and E600R mutants have biphasic conductance-voltage (G-V) relations and exhibit activation and deactivation kinetics with multiple voltage-dependent components. Importantly, the mutant-specific component in the G-V relations is observed at negative voltages where WT channels remain closed. The authors argue that the biphasic behavior in the G-V relations is unlikely to result from two different populations of channels in the oocytes, because they found that the relative amplitude between the two components in the G-V relations was highly reproducible across individual oocytes that otherwise tend to show high variability in expression levels. Instead, the G-V relations for all mutant channels could be well described by an equation that considers two open states O1 and O2, and a transition between them; O1 appeared to be unaffected by any of the structural manipulations tested (i.e. E600R, ΔPASCap, and other deletions) whereas the parameters for O2 and the transition between the two open states were different between constructs. The O1 state is not observed in WT channels and is hypothesized to be associated with voltage sensor activation. O2 represents the open state that is normally observed in WT channels and is speculated to be associated with conformational changes within the cytoplasmic gating ring that follow voltage sensor activation, which could explain why the mutations and deletions disrupting cytoplasmic interactions affect primarily O2. 

      Severing the covalent link between the voltage sensor and pore reduced O1 occupancy in one of the deletion constructs. Although this observation is consistent with the hypothesis that voltage-sensor activation drives entry into O1, this result is not conclusive. Structural as well as functional data has established that the coupling of the voltage sensor and pore does not entirely rely on the S4-S5 covalent linker between the sensor and the pore, and thus the severed construct could still retain coupling through other mechanisms, which is consistent with the prominent voltage dependence that is observed. If both states O1 and O2 require voltage sensor activation, it is unclear why the severed construct would affect state O1 primarily, as suggested in the manuscript, as opposed to decreasing occupancy of both open states. In line with this argument, the presence of Mg2+ in the extracellular solution affected both O1 and O2. This finding suggests that entry into both O1 and O2 requires voltage-sensor activation because Mg2+ ions are known to stabilize the voltage sensor in its most deactivated conformations. 

      We agree with the reviewer that access to both states requires a conformational change in the voltage sensor. This was stated in our revised article: “In contrast, to enter O2, all subunits must complete both voltage sensor transitions and the collective gating ring transition.” We interpret the two gating steps as sequential; the effective rotation of the intracellular ring would happen only once the sensor is in its fully activated position.

      We also agree that the S4-S5 segment cannot be the only interaction mechanism, as we demonstrated in our earlier work (Lörinczi et al., 2015; Tomczak et al., 2017).  

      Activation towards and closure from O1 is slow, whereas channels close rapidly from O2. A rapid alternating pulse protocol was used to take advantage of the difference in activation and deactivation kinetics between the two open components in the mutants and thus drive an increasing number of channels towards state O1. Currents activated by the alternating protocol reached larger amplitudes than those elicited by a long depolarization to the same voltage. This finding is interpreted as an indication that O1 has a larger macroscopic conductance than O2. In the revised manuscript, the authors performed single-channel recordings to determine why O1 and O2 have different macroscopic conductance. The results show that at voltages where the state O1 predominates, channels exhibited longer open times and overall higher open probability, whereas at more depolarized voltages where occupancy of O2 increases, channels exhibited more flickery gating behavior and decreased open probability. These results are informative but not conclusive because additional details about how experiments were conducted, and group data analysis are missing. Importantly, results showing inhibition of single ΔPASCap channels by a Kv10-specific inhibitor are mentioned but not shown or quantitated - these data are essential to establish that the new O1 conductance indeed represents Kv10 channel activity.

      We observed the activity of a channel compatible with Kv10.1 ΔPAS-Cap (long openings at low-moderate potentials, very short flickery activity at strong depolarizations) in 12 patches from oocytes obtained from different frog operations over a period of two and a half months once the experimental conditions could be established. As stated in the text, we did not proceed to generate amplitude histograms because we could not resolve clear single-channel events at strong depolarizations. Astemizole abolished the activity and (remarkably) strongly reduced the noise in traces at strong depolarizations, which we interpret as partially caused by flicker openings.

      Author response image 1.

      We include two example recordings of Astemizole application (100µM) on two different patches. Both recordings are performed at -60 mV (to decrease the likelihood that the channel visits O2) with 100 mM internal and 60 mM external K+. In both cases, the traces in Astemizole are presented in red.

      It is shown that conditioning pulses to very negative voltages result in mutant channel currents that are larger and activate more slowly than those elicited at the same voltage but starting from less negative conditioning pulses. In voltage-activated curves, O1 occupancy is shown to be favored by increasingly negative conditioning voltages. This is interpreted as indicating that O1 is primarily accessed from deeply closed states in which voltage sensors are in their most deactivated position. Consistently, a mutation that destabilizes these deactivated states is shown to largely suppress the first component in voltage-activation curves for both ΔPASCap and E600R channels.

      The authors then address the role of the hidden O1 state in channel regulation by calmodulation. Stimulating calcium entry into oocytes with ionomycin and thapsigarging, assumed to enhance CaM-dependent modulation, resulted in preferential potentiation of the first component in ΔPASCap and E600R channels. This potentiation was attenuated by including an additional mutation that disfavors deeply closed states. Together, these results are interpreted as an indication that calcium-CaM preferentially stabilizes deeply closed states from which O1 can be readily accessed in mutant channels, thus favoring current activation. In WT channels lacking a conducting O1 state, CaM stabilizes deeply closed states and is therefore inhibitory. It is found that the potentiation of ΔPASCap and E600R by CaM is more strongly attenuated by mutations in the channel that are assumed to disrupt interaction with the C-terminal lobe of CaM than mutations assumed to affect interaction with the N-terminal lobe. These results are intriguing but difficult to interpret in mechanistic terms. The strong effect that calcium-CaM had on the occupancy of the O1 state in the mutants raises the possibility that O1 can be only observed in channels that are constitutively associated with CaM. To address this, a biochemical pull-down assay was carried out to establish that only a small fraction of channels are associated with CaM under baseline conditions. These CaM experiments are potentially very interesting and could have wide physiological relevance. However, the approach utilized to activate CaM is indirect and could result in additional nonspecific effects on the oocytes that could affect the results.

      Finally, a mathematical model is proposed consisting of two layers involving two activation steps for the voltage sensor, and one conformational change in the cytoplasmic gating ring - completion of both sets of conformational changes is required to access state O2, but accessing state O1 only requires completion of the first voltage-sensor activation step in the four subunits. The model qualitatively reproduces most major findings on the mutants. Although the model used is highly symmetric and appears simple, the mathematical form used for the rate constants in the model adds a layer of complexity to the model that makes mechanistic interpretations difficult. In addition, many transitions that from a mechanistic standpoint should not depend on voltage were assigned a voltage dependence in the model. These limitations diminish the overall usefulness of the model which is prominently presented in the manuscript. The most important mechanistic assumptions in the model are not addressed experimentally, such as the proposition that entry into O1 depends on the opening of the transmembrane pore gate, whereas entry into O2 involves gating ring transitions - it is unclear why O2 would require further gating ring transitions to conduct ions given that the gating ring can already support permeation by O1 without any additional conformational changes.

      In essence, we agree with the reviewer; we already have addressed these points in our revised article:

      Regarding the voltage dependence we write “the κ/λ transition could reasonably be expected to be voltage independent because we related it to ring reconfiguration, a process that should occur as a consequence of a prior VSD transition. We have made some attempts to treat this transition as voltage independent but state-specific with upper-layer bias for states on the right and lower-layer bias for states on the left. This is in principle possible, as can already be gleaned from the similar voltage ranges of the left-right transition (α/β) and the κL/λ transition. However, this approach leads to a much larger number of free, less well constrained kinetic parameters and drastically complicated the parameter search. ” As you can see, we also formulated a strategy to free the model of the potentially spurious voltage dependence and (in bold here) explained why we did not follow this route in this study. 

      Regarding the need for gating ring transitions after O1, we wrote, “Thus, the underlying gating events can be separated into two steps: The first gating step involves only the voltage sensor without engaging the ring and leads to a pre-open state, which is non-conducting in the WT but conducting in our mutants. The second gating event operates at higher depolarizations, involves a change in the ring, and leads to an open state both in WT and in the mutants. ” 

      We interpret your statements such that you expect the conducting state to remain available once O1 is reached. However, the experimental evidence speaks against that the pore availability remains regardless of the further gating steps beyond O1. The description of model construction is informative here: “... we could exclude many possible [sites at which O1 connects to closed states] because the attachment site must be sufficiently far away from the conventional open state [O2]. Otherwise, the transition from "O1 preferred" to "O2 preferred" via a few closed intermediate states is very gradual and never produces the biphasic GV curves [that we observed]. ” 

      In other words, voltage-dependent gating steps beyond the state that offers access to O1 appear to close the pore, after it was open. That might occur because only then (for states in which at least one voltage sensor exceeded the intermediate position) the ring is fixed in a particular state until all sensors completed activation. In the WT, closing the pore in deactivated states might rely on an interaction that is absent in the mutant because, at least in HERG: “the interaction between the PAS domain and the C-terminus is more stable in closed than in open KV11.1 (HERG) channels, and a single chain antibody binding to the interface between PAS domain and CNBHD can access its epitope in open but not in closed channels, strongly supporting a change in conformation of the ring during gating ”

      Reviewer #3 (Public Review):

      In the present manuscript, Abdelaziz and colleagues interrogate the gating mechanisms of Kv10.1, an important voltage-gated K+ channel in cell cycle and cancer physiology. At the molecular level, Kv10.1 is regulated by voltage and Ca-CaM. Structures solved using CryoEM for Kv10.1 as well as other members of the KCNH family (Kv11 and Kv12) show channels that do not contain a structured S4-S5 linker imposing therefore a non-domain swapped architecture in the transmembrane region. However, the cytoplasmatic N- and C- terminal domains interact in a domain swapped manner forming a gating ring. The N-terminal domain (PAS domain) of one subunit is located close to the intracellular side of the voltage sensor domain and interacts with the C-terminal domain (CNBHD domain) of the neighbor subunit. Mutations in the intracellular domains has a profound effect in the channel gating. The complex network of interactions between the voltage-sensor and the intracellular domains makes the PAS domain a particularly interesting domain of the channel to study as responsible for the coupling between the voltage sensor domains and the intracellular gating ring.

      The coupling between the voltage-sensor domain and the gating ring is not fully understood and the authors aim to shed light into the details of this mechanism. In order to do that, they use well established techniques such as site-directed mutagenesis, electrophysiology, biochemistry and mathematical modeling. In the present work, the authors propose a two open state model that arises from functional experiments after introducing a deletion on the PAS domain (ΔPAS Cap) or a point mutation (E600R) in the CNBHD domain. The authors measure a bi-phasic G-V curve with these mutations and assign each phase as two different open states, one of them not visible on the WT and only unveiled after introducing the mutations.

      The hypothesis proposed by the authors could change the current paradigm in the current understanding for Kv10.1 and it is quite extraordinary; therefore, it requires extraordinary evidence to support it.

      STRENGTHS: The authors use adequate techniques such as electrophysiology and sitedirected mutagenesis to address the gating changes introduced by the molecular manipulations. They also use appropriate mathematical modeling to build a Markov model and identify the mechanism behind the gating changes.

      WEAKNESSES: The results presented by the authors do not fully support their conclusions since they could have alternative explanations. The authors base their primary hypothesis on the bi-phasic behavior of a calculated G-V curve that do not match the tail behavior, the experimental conditions used in the present manuscript introduce uncertainties, weakening their conclusions and complicating the interpretation of the results. Therefore, their experimental conditions need to be revisited. 

      We respectfully disagree. We think that your suggestions for alternative explanations are addressed in the current version of the article. We will rebut them once more below, but we feel the need to point out that our arguments are already laid out in the revised article.

      I have some concerns related to the following points:

      (1) Biphasic gating behavior

      The authors use the TEVC technique in oocytes extracted surgically from Xenopus Leavis frogs. The method is well established and is adequate to address ion channel behavior. The experiments are performed in chloride-based solutions which present a handicap when measuring outward rectifying currents at very depolarizing potentials due to the presence of calcium activated chloride channel expressed endogenously in the oocytes; these channels will open and rectify chloride intracellularly adding to the outward rectifying traces during the test pulse. The authors calculate their G-V curves from the test pulse steady-state current instead of using the tail currents. The conductance measurements are normally taken from the 'tail current' because tails are measured at a fix voltage hence maintaining the driving force constant. 

      We respectfully disagree. In contrast to other channels, like HERG, a common practice for Kv10 is not to use tail currents. It is long known that in this channel, tail currents and test-pulse steady-state currents can appear to be at odds because the channels deactivate extremely rapidly, at the border of temporal resolution of the measurements and with intricate waveforms. This complicates the estimation of the instantaneous tail current. Therefore, the outward current is commonly used to estimate conductance (Terlau et al., 1996; Schönherr et al., 1999; Schönherr et al., 2002; Whicher and MacKinnon, 2019), while the latter authors also use the extreme of the tail for some mutants.

      Due to their activation at very negative voltage, the reversal potential in our mutants can be measured directly; we are, therefore, more confident with this approach. Nevertheless, we have determined the initial tail current in some experiments. The behavior of these is very similar to the average that we present in Figure 1. The biphasic behavior is unequivocally present.

      Author response image 2.

      Calculating the conductance from the traces should not be a problem, however, in the present manuscript, the traces and the tail currents do not agree. 

      The referee’s observation is perfectly in line with the long-standing experience of several labs working with KV10: tail current amplitudes in KV10 appear to be out of proportion for the WT open state (O2). Importantly, this is due to the rapid closure, which is not present in O1. As a consequence, the initial amplitude of tail currents from O1 are easier to estimate correctly, and they are much more obvious in the graphs. Taken together, these differences between O1 and O2 explain the misconception the reviewer describes next.

      The tail traces shown in Fig1E do not show an increasing current amplitude in the voltage range from +50mV to +120mV, they seem to have reached a 'saturation state', suggesting that the traces from the test pulse contain an inward chloride current contamination. 

      As stated in the text and indicated in Author response image 3, the tail currents In Figure 1E increase in amplitude between +50 and +120 mV, as can be seen in the examples below from different experiments (+50 is presented in black, +120 in red). As stated above, the increase is not as evident as in traces from other mutants because the predominance of O2 also implies a much faster deactivation.

      Author response image 3. 

      We are aware that Ca2+-activated Cl- currents can represent a problem when interpreting electrophysiological data in oocytes. In fact, we show in Supplement 1 to Figure 8 that this can be the case during the Ca2+-CaM experiments, where the increase in Ca2+ would certainly augment Cl- contribution to the outward current. This is why we performed these experiments in Cl--free solutions. As we show in Figure 8, the biphasic behavior was also present in those experiments. 

      Importantly, Cl- free bath solutions would not correct contamination during the tail, since this would correspond to Cl- exiting the oocyte. Yet, if there would be contamination of the outward currents by Cl-, one would expect it to increase with larger depolarizations as the typical Ca2+activated Cl- current in oocytes does. As the reviewer states, this does not seem to be the case.

      In addition, this second component identified by the authors as a second open state appears after +50mV and seems to never saturate. The normalization to the maximum current level during the test pulse, exaggerates this second component on the calculated G-V curve. 

      We agree that this second component continues to increase; the reviewer brought this up in the first review, and we have already addressed this in our reply and in the discussion of the revised version: “This flicker block might also offer an explanation for a feature of the mutant channels, that is not explained in the current model version: the continued increase in current amplitude, hundreds of milliseconds into a strong depolarization (Supp. 4 to Fig. 9). If the relative stability of O2 and C2 continued to change throughout depolarization, such a current creep-up could be reproduced. However, this would require either the introduction of further layers of On ↔Cn states, or a non-Markovian modification of the model’s time evolution.” With non-Markovian, we mean a Langevin-type diffusive process. 

      It's worth noticing that the ΔPASCap mutant experiments on Fig 5 in Mes based solutions do not show that second component on the G-V.

      For the readers of this conversation, we would like to clarify that the reviewer likely refers to experiments shown in Fig. 5 of the initial submission but shown in Fig. 6 of the revised version (“Hyperpolarization promotes access to a large conductance, slowly activating open state.” Fig. 5 deals with single channels). We agree that these data look different, but this is because the voltage protocols are completely different (compare Fig. 6A (fixed test pulse, varied prepulse) and Fig. 2A (varied test pulse, fixed pre-pulse). Therefore, no biphasic behavior is expected. 

      Because these results are the foundation for their two open state hypotheses, I will strongly suggest the authors to repeat all their Chloride-based experiments in Mes-based solutions to eliminate the undesired chloride contribution to the mutants current and clarify the contribution of the mutations to the Kv10.1 gating.

      In summary, we respectfully disagree with all concerns raised in point (1). Our detailed arguments rebutting them are given above, but there is a more high-level concern about this entire exchange: the referee casts doubt on observations that are not new. Several labs have reported for a group of mutant KCNH channels: non-monotonic voltage dependence of activation (see, e.g., Fig. 6D in Zhao et al., 2017), multi-phasic tail currents (see e.g. Fig. 4A in Whicher and MacKinnon, 2019, in CHO cells where Cl- contamination is not a concern), and activation by high [Ca2+]i (Lörinczi et al., 2016). Our study replicates those observations and hypothesizes that the existence of an additional conducting state can alone explain all previously unexplained observations. We highlight the potency of this hypothesis with a Markov model that qualitatively reproduces all phenomena. We not only factually disagree with the individual points raised, but we also think that they don't touch on the core of our contribution

      (2) Two step gating mechanism.

      The authors interpret the results obtained with the ΔPASCap and the E600R as two step gating mechanisms containing two open states (O1 and O2) and assign them to the voltage sensor movement and gating ring rotation respectively. It is not clear, however how the authors assign the two open states.

      The results show how the first component is conserved amongst mutations; however, the second one is not. The authors attribute the second component, hence the second open state to the movement of the gating ring. This scenario seems unlikely since there is a clear voltagedependence of the second component that will suggest an implication of a voltage-sensing current.

      We do not suggest that the gating ring motion is not voltage dependent. We would like to point out that voltage dependence can be conveyed by voltage sensor coupling to the ring; this is the widely accepted theory of how the ring can be involved. Should the reviewer mean it in a narrow sense, that the model should be constructed such that all voltage-dependent steps occur before and independently of ring reconfiguration and that only then an additional step that reflects the (voltage-independent) reconfiguration solely, we would like to point the reviewer to the article, where we write: “the κ/λ transition could reasonably be expected to be voltage independent because we related it to ring reconfiguration, a process that should occur as a consequence of a prior VSD transition. We have made some attempts to treat this transition as voltage independent but state-specific with upper-layer bias for states on the right and lower-layer bias for states on the left. This is in principle possible, as can already be gleaned from the similar voltage ranges of the left-right transition (α/β) and the κL/λ transition. However, this approach leads to a much larger number of free, less well constrained kinetic parameters and drastically complicated the parameter search. ” As you can see, we also formulated a strategy to free the model from the potentially spurious voltage dependence and (in bold here) explained why we did not follow this route in this study. 

      The split channel experiment is interesting but needs more explanation. I assume the authors expressed the 2 parts of the split channel (1-341 and 342-end), however Tomczak et al showed in 2017 how the split presents a constitutively activated function with inward currents that are not visible here, this point needs clarification.

      As stated in the panel heading, the figure legend, and the main text, we did not use 1-341 and 342-end as done in Tomczak et al. Instead, “we compared the behavior of ∆2-10 and ∆210.L341Split,”. Evidently, the additional deletion (2-10) causes a shift in activation that explains the difference you point out. However, as we do not compare L341Split and ∆210.L341Split but ∆2-10 and ∆2-10.L341Split, our conclusion remains that “As predicted, compared to ∆2-10, ∆2-10.L341Split showed a significant reduction in the first component of the biphasic GV (Fig. 2C, D).” Remarkably, the behavior of the ∆3-9 L341Split described in Whicher and MacKinnon, 2019 (Figure 5) matches that of our ∆2-10 L341Split, which we think reinforces our case.

      Moreover, the authors assume that the mutations introduced uncover a new open state, however the traces presented for the mutations suggest that other explanations are possible. Other gating mechanisms like inactivation from the closed state, can be introduced by the mutations. The traces presented for ΔPASCap but specially E600R present clear 'hooked tails', a direct indicator of a populations of inactive channels during the test pulse that recover from inactivation upon repolarization (Tristani-Firouzi M, Sanguinetti MC. J Physiol. 1998). 

      There is a possibility that we are debating nomenclature here. In response to the suggestion that all our observations could be explained by inactivation, we attempted a disambiguation of terms in the reply and the article. As the argument is brought up again without reference to our clarification attempts, we will try to be more explicit here:

      If, starting from deeply deactivated states, an open state is reached first, and then, following further activation steps, closed states are reached, this might be termed “inactivation”. In such a reading, our model features many inactivated states. The shortest version of such a model is C-O-I. It is for instance used by Raman and Bean (2001; DOI: 10.1016/S00063495(01)76052-3) to explain NaV gating in Purkinje neurons. If “inactivation” is meant in the sense that a gating transition exists, which is orthogonal to an activation/deactivation axis, and that after this orthogonal transition, an open state cannot be reached anymore, then all of the upper floor in our model is inactivated with respect to the open state O1. Finally, the state C2 is an inactivated state to O2. In this view, “inactivation” explains the observed phenomena. 

      However, we must disagree if the referee means that a parsimonious explanation exists in which a single conducting state is the only source for all observed currents.   

      There is a high-level reason: we found a single assumption that explains three different phenomena, while the inactivation hypothesis with one conducting state cannot explain one of them (the increase of the first component under raised CaM). But there is also a low-level reason: the tails in Tristani-Firouzi and Sanguinetti 1998 are fundamentally different from what we report herein in that they lack a third component. Thus, those tails are consistent with recovery from inactivation through a single open state, while a three-component tail is not. In the framework of a Markov model, the time constants of transitions from and to a given state (say O2), cannot change unless the voltage changes. During the tail current, the voltage does not change, yet we observe: 

      i) a rapid decrease with a time constant of at most a few milliseconds (Fig 9 S2, 1-> 2),  ii) a slow increase in current, peaking after approximately 25 milliseconds and iii) a relaxation to zero current with a time constant of >50 ms. 

      According to the reviewer’s suggestion, these processes on three timescales should all be explained by depopulating and repopulating the same open state while all rates are constant. There might well be a complicated multi-level state diagram with a single open state with different variants, like (open and open inactivated) that could produce triphasic tails with these properties if the system had not reached a steady state distribution at the end of the test pulse. It cannot, however, achieve it from an equilibrated system, and certainly, it cannot at the same time produce “biphasic activation” and “activation by CaM”. 

      The results presented by the authors can be alternatively explained with a change in the equilibrium between the close to inactivated/recovery from inactivation to the open state. 

      Again, we disagree. The model construction explains in detail that the transition from the first to the second phase is not gradual. Shifting equilibria cannot reproduce this. We have extensively tested that idea and can exclude this possibility.

      Finally, the authors state that they do not detect "cumulative inactivation after repeated depolarization" but that is considering inactivation only from the open state and ignoring the possibility of the existence of close state inactivation or, that like in hERG, that the channel inactivates faster that what it activates (Smith PL, Yellen G. J Gen Physiol. 2002). 

      We respectfully disagree. We explicitly model an open state that inactivates faster (O2->C2) than it activates. Once more, this is stated in the revised article, which we point to for details. Again, this alternative mechanism does not have the potential to explain all three effects. As discussed above about the chloride contamination concerns, this inactivation hypothesis was mentioned in the first review round and, therefore, addressed in our reply and the revised article. We also explained that “inactivation” has no specific meaning in Markov models. In the absence of O1, all transitions towards the lower layer are effectively “inactivation from closed states”, because they make access to the only remaining open state less likely”. But this is semantics. What is relevant is that no network of states around a single open state can reproduce the three effets in a more parsimonious way than the assumption of the second open state does.

      (3) Single channel conductance.

      The single channels experiments are a great way to assess the different conductance of single channel openings, unfortunately the authors cannot measure accurately different conductances for the two proposed open states. The Markov Model built by the authors, disagrees with their interpretation of the experimental results assigning the exact same conductance to the two modeled open states. To interpret the mutant data, it is needed to add data with the WT for comparison and in presence of specific blockers. 

      We respectfully disagree. As previously shown, the conductance of the flickering wild-type open state is very difficult to resolve. Our recordings do not show that the two states have different single-channel conductances, and therefore the model assumes identical singlechannel conductance. 

      The important point is that the single-channel recordings clearly show two different gating modes associated with the voltage ranges in which we predict the two open states. One has a smaller macroscopic current due to rapid flickering (aka “inactivation”). These recordings are another proof of the existence of two open states because the two gating modes occur.  Wild-type data can be found in Bauer and Schwarz, (2001, doi:10.1007/s00232-001-0031-3) or Pardo et al., (1998, doi:10.1083/jcb.143.3.767) for comparison.

      We appreciate the effort editors and reviewers invested in assessing the revised manuscript. Yet, we think that the demanded revision of experimental conditions and quantification methods contradicts the commonly accepted practice for KV10 channels. Some of the reviewer comments are skeptical about the biphasic behavior, which is an established and replicated finding for many mutants and by many researchers. The alternative explanations for these disbelieved findings are either “semantics” or cannot quantitatively explain the measurements. Therefore, only the demand for more explanations and unprecedented resolution in singlechannel recordings remains. We share these sentiments.

      ———— The following is the authors’ response to the original reviews.

      (1) The authors must show that the second open state is not just an artifact of endogenous activity but represents the activity of the same EAG channels. I suggest that the authors repeat these experiments in Mes-based solutions. 

      (2) Along the same lines, it is necessary to show that these currents can be blocked using known EAG channel blockers such as astemizole. Ultimately, it will be important to demonstrate using single-channel analysis that these do represent two distinct open states separated by a closed state. 

      We have addressed these concerns using several approaches. The most substantial change is the addition of single-channel recordings on ΔPASCap. In those experiments, we could provide evidence of the two types of events in the same patch, and the presence of an outward current at -60 mV, 50 mV below the equilibrium potential for chloride. The channels were never detected in uninjected oocytes, and Astemizole silenced the activity in patches containing multiple channels. These observations, together with the maintenance of the biphasic behavior that we interpret as evidence of the presence of O1 in methanesulfonate-based solutions, strongly suggest that both O1 and O2 obey the expression of KV10.1 mutants.

      (3) Currents should be measured by increasing the pulse lengths as needed in order to obtain the true steady-state G-V curves. 

      We agree that the endpoint of activation is ill-defined in the cases where a steady-state is not reached. This does indeed hamper quantitative statements about the relative amplitude of the two components. However, while the overall shape does change, its position (voltage dependence) would not be affected by this shortcoming. The data, therefore, supports the claim of the “existence of mutant-specific O1 and its equal voltage dependence across mutants.”

      (4) A more clear and thorough description should be provided for how the observations with the mutant channels apply to the behavior of WT channels. How exactly does state O1 relate to WT behavior, and how exactly do the parameters of the mathematical model differ between WT and mutants? How can this be interpreted at a structural level? What could be the structural mechanism through which ΔPASCap and E600R enable conduction through O1? It seems contradictory that O1 would be associated exclusively with voltage-sensor activation and not gating ring transitions, and yet the mutations that enable cation access through O1 localize at the gating ring - this needs to be better clarified. 

      We have undertaken a thorough rewriting of all sections to clarify the structural correlates that may explain the behavior of the mutants. In brief, we propose that when all four voltage sensors move towards the extracellular side, the intracellular ring maintains the permeation path closed until it rotates. If the ring is altered, this “lock” is incompetent, and permeation can be detected (page 34). By fixing the position of the ring, calmodulin would preclude permeation in the WT and promote the population of O1 in the mutants.

      (5) Rather than the t80% risetime, exponential fits should be performed to assess the kinetics of activation. 

      We agree that the assessment of kinetics by a t80% is not ideal. We originally refrained from exponential fits because they introduce other issues when used for processes that are not truly exponential (as is the case here). We had planned to perform exponential fits in this revised version, but because the activation process is not exponential, the time constants we could provide would not be accurate, and the result would remain qualitative as it is now. In the experiments where we did perform the fits (Fig. 3), the values obtained support the statement made. 

      (6) It is argued based on the G-V relations in Figure 2A that none of the mutations or deletions introduced have a major effect on state O1 properties, but rather affect state O2. However, the occupancy of state O2 is undetermined because activation curves do not reach saturation. It would be interesting to explore the fitting parameters on Fig.2B further to test whether the data on Fig 2A can indeed only be described by fits in which the parameters for O1 remain unchanged between constructs. 

      We agree that the absolute occupancy of O2 cannot be properly determined if a steady state is not reached. This is, however, a feature of the channel. During very long depolarizations in WT, the current visually appears to reach a plateau, but a closer look reveals that the current keeps increasing after very long depolarizations (up to 10 seconds; see, e.g., Fig. 1B in Garg et al., 2013, Mol Pharmacol 83, 805-813. DOI: 10.1124/mol.112.084384). Interestingly, although the model presented here does not account for this behavior, we propose changes in the model that could. “If the relative stability of O2 and C2 continued to change throughout the depolarization such a current creep-up could be reproduced. However, this would require either the introduction of further layers of On↔Cn states or a non-Markovian modification of the model’s evolution.” Page 34.

      (7) The authors interpret the results obtained with the mutants DPASCAP and E600R -tested before by Lorinczi et al. 2016, to disrupt the interactions between the PASCap and cNBHD domains- as a two-step gating mechanism with two open states. All the results obtained with the E600R mutant and DPASCap could also be explained by inactivation/recovery from inactivation behavior and a change in the equilibrium between the closed states closed/inactivated states and open states. Moreover, the small tails between +90 to +120 mV suggest channels accumulate in an inactive state (Fig 1E). It is not convincing that the two open-state model is the mechanism underlying the mutant's behavior.  

      We respectfully disagree with the notion that a single open state can provide a plausible explanation for "All the results obtained with the E600R mutant and DPASCap". We think that our new single channel results settle the question, but even without this direct evidence, a quantitative assessment of the triphasic tail currents all but excludes the possibility of a single open state. We agree that it is, in principle, possible to obtain some form of a multiphasic tail with a single open state using the scheme suggested in this comment: at the end of the test pulse, a large fraction of the channels must be accumulated in inactive states, and a few are in the open state. The hyperpolarization to -100mV then induces a rapid depopulation of the open state, followed by slower replenishments from the inactive state. Exactly this process occurs in our model, when C2 empties through O2 (Supp. 5 to Fig 9, E600R model variant). However, this alone is highly unlikely to quantitatively explain the measured tail currents, because of the drastically different time scales of the initial current decay (submillisecond to at most a few milliseconds lifetime) and the much slower transient increase in current (several tens of milliseconds) and the final decay with time constants of >100 ms (see for instance data in Fig. 1 E for E600R +50 to +120mV test pulse). To sustain the substantial magnitude of slowly decaying current by slow replenishment of an open state with a lifetime of 1 ms requires vast amounts of inactivated channels. A rough estimation based on the current integral of the initial decay and the current integral of the slowly decaying current suggests that at the end of the test pulse, the ratio inactivated/open channels would have to be 500 to 1500 for this mechanism to quantitatively explain the observed tail currents. To put this in perspective: This would suggest that without inactivation all the expressed channels in an oocyte would provide 6 mA current during the +100 mV test pulse. While theoretically possible, we consider this a less likely explanation than a second open state.

      (8) Different models should be evaluated to establish whether the results in Figure 4 can also be explained by a model in which states O1 and O2 have the same conductance. It would be desirable if the conductance of both states were experimentally determined - noise analysis could be applied to estimate the conductance of both states. 

      In the modified model, O1 and O2 have the same single-channel conductance. The small conductance combined with the fast flickering did not allow an accurate determination, but we can state that there is no evidence that the single-channel conductance of the states is different.

      (9) Although not included, it looks like the model predicts some "conventional inactivation" This can be appreciated in Fig 8, and in the traces at -60mV. Interestingly, the traces obtained in the absence of Cl- also undergo slow inactivation, or 'conventional inactivation' as referred to by the authors. Please revise the following statement "Conventional inactivation was never detected in any mutants after repeated or prolonged depolarization. In the absence of inactivation, the pre-pulse dependent current increase at +40 mV could be related to changes in the relative occupancy of the open states". 

      We have carefully edited the manuscript to address this concern. The use of the term inactivation admittedly represents a challenge. We agree that the state that results from the flickering block (C2) could be defined as “inactivated” because it is preceded by an open state. Yet, in that case, the intermediate states that the channel travels between O1 and O2 would also be sensu stricto “inactivated”, but only in the mutants. We have made this clear in page 17.

      Recommendations for improving the writing and presentation.

      (1) Methods section: Please state the reversal potential calculated for the solution used. It looks like the authors used an Instantaneous I-V curve method to calculate the reversal potential; if that's correct, please show the I-V and the traces together with the protocol used. 

      We have provided the calculated reversal potentials for excised patches. We cannot predict the reversal potential in whole oocytes because we have no control over the intracellular solution. The reversal potential was determined in the mutants through the current at the end of the stimulus because the mutants produced measurable inward currents. The differences in reversal potential were not significant among mutants.

      Pulse protocols have been added to the figures.

      (2) Figure 1 suggestion: Combine the two panels in panel D and move the F panel up so the figure gets aligned in the lower end.

      Thank you, this has been done.

      (3) Please clarify the rationale for using the E600R-specific mutant. I assume it is based on the Lorinzci et al. 2016 effect and how this is similar to the DPASCap phenotype, or is it due to the impact of this mutation in the interactions between the N-term and the cNBHD? 

      We have explained the rationale for the use of E600R explicitly on page 6.

      (4) Fig S1A is not present in the current version of the manuscript. Include a cartoon as well as a structural figure clearly depicting the perturbations introduced by E600R, ΔPASCap, and the other deletions that are tested. Additional structural information supporting the discussion would also be helpful to establish clearer mechanistic links between the experimental observations described here and the observed conformational changes between states in Kv10 channel structures. 

      We have corrected this omission, thank you for pointing it out.

      (5) It would be informative to see the traces corresponding to the I-V shown in Fig 7 A and B at the same indicated time points (0, 60, 150, and 300s). Did the authors monitor the Ca2+ signal rise after the I&T treatment to see if it coincides with the peak in the 60s? 

      In Figure 7 (now Figure 8) we used voltage ramps instead of discrete I-V protocols because of the long time required for recording the latter. This is stated on page 19. Ca2+ was monitored through Cl- current after ionomycin/thapsigargin. The duration of the Ca2+ increase was reproducible among oocytes and in good agreement with the changes observed in the biphasic behavior of the mutants (Supplement 1 to Figure 8).

      (6) Fig 4. Please state in the legend what the different color traces correspond to in E600R and DPASCap. Is there a reason to change the interpulse on DPASCap to -20mV and not allow this mutant to close? Please state. How do the authors decide the 10 ms interval for the experiments in Fig 2? 

      Thank you for pointing this out, we have added the description. We have explained why we use a different protocol for ΔPASCap and the reason for using 10 ms interval (we believe the referee means Figure 4) on page 12.  

      (7) Fig. 5. Since the pre-pulse is supposed to be 5s, but the time scale doesn't correspond with a pre-pulse of 5 s before the test pulse to +40mV. Has the pre-pulse been trimmed for representation purposes? If so, please state. 

      The pre-pulse was 5s, but as the reviewer correctly supposed, the trace is trimmed to keep the +40 mV stimulus visible. This has now been clearly stated in the legend.

      (8) The mutant L322H is located within the S4 helix according to the Kv10.1 structure (PDB 5K7L), not in the 'S3-S4 linker'; please correct. 

      This has been done, thank you.

      The introduction of this mutant should also shift the voltage dependence toward more hyperpolarizing potentials (around 30mV, according to Schoenherr et al. 1999). It looks like that shift is present within the first component of the G-V. Still, since the max amplitude from the second component could be contaminated by endogenous Cl- currents, this effect is minimized. Repeating these experiments in the no Cl- solutions will help clarify this point and see the effect of the DPASCap and E600R in the background of a mutation that accelerates the transitions between the closed states (see Major comment 1). Did the authors record L322H alone for control purposes? 

      We have decided not to measure L322H alone or repeat the measurements in Cl--free solutions because we do not see a way to use the quantitative assessment of the voltage dependence of L322H and the L322H-variants of the eag domain mutants. Like in our answer to main point 3, we base our arguments not on the precise voltage dependence of the second component but on the shape of the G-V curves instead, specifically the consistent appearance of the first component and the local conductance minimum between the first and second components. After the introduction of L322H the first component is essentially absent.

      We think that the measurements of the L322H mutants cannot be interpreted as a hyperpolarizing shift in the first component. The peak of the first conductance component occurs around -20 mV in ΔPASCap and E600R (Fig. 7 C, D). After a -30mV shift, in L322H+DPASCap and L322H+E600R, this first peak would still be detected within the voltage range in our experiments, but it is not. A contamination of the second component would have little impact on this observation, which is why we refrain from the suggested measurements.  

      (9) The authors differentiate between an O1 vs. O2 state with different conductances, and maybe I missed it, but there's no quantitative distinction between the components; how are they different?

      Please see the response to the main comments 1 and 2. This has been addressed in singlechannel recordings.

      (10) Please state the voltage protocols, holding voltages, and the solutions (K+ concentration and Cl-presence/absence) used for the experiments presented in the legends on the figures. Hence, it's easier to interpret the experiments presented. 

      Thank you, this has been done.

      (11) The authors state on page 7 that "with further depolarizations, the conductance initially declined to rise again in response to strong depolarizations. This finding matches the changes in amplitude of the tail currents, which, therefore, probably reflect a true change in conductance" However, the tails in the strong voltage range (+50 to +120 mV) for the E600R mutant argue against this result. Please review.

      The increase in the amplitude of the tail current is also present in E600R, but the relative increase is smaller. We have decided against rescaling these traces because the Figure is already rather complex. We indicated this fact with a smaller arrow and clarified it in the text (page 8).

      (12) The authors mention that the threshold of activation for the WT is around -20mV; however, the foot of the G-V is more around -30 or -40mV. Please revise. 

      Thank you. We have done this. 

      (13) The authors state on page 9 that the 'second component occurs at progressively more depolarized potentials for increasingly larger N-terminal deletions" However E600R mutant that conserves the N-terminal intact has a shift as pronounced as the DPASCap and larger than the D2-10. How do the authors interpret this result? 

      We have corrected this statement in page 10 : “…the second component occurs at progressively more depolarized potentials for increasingly larger N-terminal deletions and when the structure of the ring is altered through disruption of the interaction between N- and C-termini (E600R)”.

      (14) The equation defined to fit the G-Vs, can also be used to describe the WT currents. If the O1 is conserved and present in the WT, this equation should also fit the WT data properly. The 1-W component shown could also be interpreted as an inactivating component that, in the WT, shifts the voltage-dependence of activation towards depolarizing potentials and is not visible. Still, the mutants do show it as if the transition from closed-inactivated states is controlled by interactions in the gating ring, and disturbing them does affect the transitions to the open state. 

      Out of the two open states in the mutant, O2 is the one that shares properties with the WT (e.g. it is inaccessible during Ca2+-CaM binding) while O1 is the open state with the voltage dependence that is conserved across the mutants. We, therefore, believe that this question is based on a mix-up of the two open states. We appreciate the core of the question: does the pattern in the mutants’ G-V curves find a continuation in the WT channel? 

      Firstly, the component that is conserved among mutants does not lead to current in the WT because the corresponding open state (O1) is not observed in WT. However, the gating event represented by this component should also occur in WT and –given its apparent insensitivity to eag domain mutations–  this gating step should occur in WT with the same voltage dependence as in all the mutants. This means that this first component sets a hard boundary for the most hyperpolarized G-V curve we can expect in the WT, based on our mutant measurements. Secondly, the second component shows a regular progression across mutants: The more intact the eag domain is, the more hyperpolarized the Vhalf values of transition term (1-W) and O2 activation. In Δ2-10, the transition term already almost coincides with O1 activation (estimated Vhalf values of -33.57 and -33.47 mV). A further shift of (1-W) in the WT is implausible because, if O1 activation is coupled to the earliest VSD displacement, the transition should not occur before O1 activation. Still, the second component might shift to more hyperpolarized values in the WT, depending on the impact of amino acids 2 to 10 on the second VSD transition.

      In summary, in WT the G-V should not be more hyperpolarized than the first component of the mutants, and the (1-W)-component probably corresponds to the Δ2-10 (1-W)-component. In WT the second component should be no more depolarized than the second component of Δ2-10. The WT G-V (Fig.1B) meets all these predictions derived from the pattern in the mutant GVs: When we use Eq. 4 to fit the WT G-V with A1=0 (O1 is not present in WT) and the parameters of the transition term (1-W)  fixed to the values attained in Δ2-10, we obtain a fit for the O2 component with Vhalf\=+21mV. This value nicely falls into the succession of Vhalf values for Δeag, ΔPASCap, and Δ2-10 (+103mV,+80mV,+52mV) and, at the same time, it is not more hyperpolarized than the conserved first component (Vhalf -34mV). Our measurements therefore support that the O2 component in the mutants corresponds to the single open state in the WT. 

      (15) Page 15, the authors state that 'The changes in amplitude and kinetics in response to rising intracellular Ca2+ support our hypothesis that Ca-CaM stabilized O1, possibly by driving the channels to deep closed states (Fig 5 and 6)' (pg 15). This statement seems contradictory; I can't quite follow the rationale since Ca2+ potentiates the current (Fig 7), and the addition of the L322H mutant in Fig 7 makes the shift of the first component to negative potentials visible.

      Please check the rationale for this section. 

      We have explained this more explicitly in the discussion (page 32). “Because access to O1 occurs from deep closed states, this could be explained by an increased occupancy of such deactivated states in response to CaM binding. This appears to be the case since CaM induces a biphasic behavior in the mutant channels that show reduced access to deep closed states; thus, L322H mutants behave like the parental variants in the presence of Ca2+-CaM. This implies a mechanistic explanation for the effect of Ca2+-CaM on WT since favoring entry into deep closed states would result in a decrease in current amplitude in the absence of (a permeable) O1”.

      Also, Figs 5 and 6 seem miscited here. 

      Thank you, we have corrected this.

      (16) For Figure 5, it would be helpful if each of the current traces corresponding to a particular voltage had a different color. That way, it will be easier to see how the initial holding voltage modulates current. 

      We have considered this suggestion, and we agree that it would make it easier to follow. Yet, since we have identified the mutants with different colors, it would be inconsistent if we used another color palette for this Figure. Supplement 3 to Figure 9 shows the differences in a clearer way.

      (17) Add zero-current levels to all current traces.

      We have done this.

      (18) The mathematical model should be described better. Particularly, the states from which O1 can be accessed should be described more clearly, as well as whether the model considers any direct connectivity between states O1 and O2. The origin of the voltage-dependence for transitions that do not involve voltage-sensor movements should be discussed. Also, it separation of kappa into kappa-l and kappa-r should be described. 

      We have extensively rewritten the description of the mathematical model to address these concerns.

      (19) Page 4, "reveals a pre-open state in which the transmembrane regions of the channel are compatible with ion permeation, but is still a nonconducting state". Also, page 27, "renders a hydrophobic constriction wider than 8 Å, enough to allow K+ flow, but still corresponds to a non-conducting state". These sentences are confusing - how can the regions be compatible with ion permeation, and still not be conducting? Is cation conductance precluded by a change in the filter, or elsewhere? How is it established that it represents a non-conducting state? 

      We have rephrased to clarify this apparent inconsistence. Page 4: “(…) in which the transmembrane regions of the channel are compatible with ion permeation (the permeation path is dilated, like in open states) but the intracellular gate is still in the same conformation as in closed states (Zhang et al., 2023).” Page 31: “The presence of an intact intracellular ring would preclude ionic flow in the WT, and its alteration would explain the permeability of this state in the mutants.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      fMRI was used to address an important aspect of human cognition - the capacity for structured representations and symbolic processing - in a cross-species comparison with non-human primates (macaques); the experimental design probed implicit symbolic processing through reversal of learned stimulus pairs. The authors present solid evidence in humans that helps elucidate the role of brain networks in symbolic processing, however the evidence from macaques was incomplete (e.g., sample size constraints, potential and hard-to-quantify differences in attention allocation, motivation, and lived experience between species).

      Thank you very much for your assessment. We would like to address the potential issues that you raise point-by-point below.

      We agree that for macaque monkey physiology, sample size is always a constraint, due to both financial and ethical reasons. We addressed this concern by combining the results from two different labs, which allowed us to test 4 animals in total, which is twice as much as what is common practice in the field of primate physiology. (We discuss this now on lines 473-478.)

      Interspecies differences in motivation, attention allocation, task strategies etc. could also be limiting factors. Note that we did address the potential lack of attention allocation directly in Experiment 2 using implicit reward association, which was successful as evidenced by the activation of attentional control areas in the prefrontal cortex. We cannot guarantee that the strategies that the two species deploy are identical, but we tentatively suggest that this might be a less important factor in the present study than in other interspecies comparisons that use explicit behavioral reports. In the current study, we directly measured surprise responses in the brain in the absence of any explicit instructions in either species, which allowed us to  measure the spontaneous reversal of learned associations, which is a very basic element of symbolic representation. Our reasoning is that such spontaneous responses should be less dependent on attention allocation and task strategies. (We discuss this now in more detail on lines 478-485.)

      Finally, lived experience could be a major factor. Indeed, obvious differences include a lifetime of open-field experiences and education in our human adult subjects, which was not available to the monkey subjects, and includes a strong bias towards explicit learning of symbolic systems (e.g. words, letters, digits, etc). However, we have previously shown that 5-month-old human infants spontaneously generalize learning to the reversed pairs after a short learning in the lab using EEG (Kabdebon et al, PNAS, 2019). This indicates that also with very limited experience, humans spontaneously reverse learned associations. (We discuss this now in more detail on lines 478-485.) It could be very interesting to investigate whether spontaneous reversal could be present in infant macaque monkeys, as there might be a critical period for this effect. Although neurophysiology in awake infant monkeys is highly challenging, it would be very relevant for future work. (We discuss this in more detail on lines 493-498.)

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Kerkoerle and colleagues present a very interesting comparative fMRI study in humans and monkeys, assessing neural responses to surprise reactions at the reversal of a previously learned association. The implicit nature of this task, assessing how this information is represented without requiring explicit decision-making, is an elegant design. The paper reports that both humans and monkeys show neural responses across a range of areas when presented with incongruous stimulus pairs. Monkeys also show a surprise response when the stimuli are presented in a reversed direction. However, humans show no such surprise response based on this reversal, suggesting that they encode the relationship reversibly and bidirectionally, unlike the monkeys. This has been suggested as a hallmark of symbolic representation, that might be absent in nonhuman animals. 

      I find this experiment and the results quite compelling, and the data do support the hypothesis that humans are somewhat unique in their tendency to form reversible, symbolic associations. I think that an important strength of the results is that the critical finding is the presence of an interaction between congruity and canonicity in macaques, which does not appear in humans. These results go a long way to allay concerns I have about the comparison of many human participants to a very small number of macaques. 

      We thank the reviewer for the positive assessment. We also very much appreciate the point about the interaction effect in macaque monkeys – indeed, we do not report just a negative finding. 

      I understand the impossibility of testing 30+ macaques in an fMRI experiment. However, I think it is important to note that differences necessarily arise in the analysis of such datasets. The authors report that they use '...identical training, stimuli, and whole-brain fMRI measures'. However, the monkeys (in experiment 1) actually required 10 times more training. 

      We agree that this description was imprecise. We have changed it to “identical training stimuli” (line 151), indeed the movies used for training were strictly identical. Furthermore, please note that we do report the fMRI results after the same training duration. In experiment 1, after 3 days of training, the monkeys did not show any significant results, even in the canonical direction. However, in experiment 2, with increased attention and motivation, a significant effect was observed on the first day of scanning after training, as was found in human subjects (see Figure 4 and Table 3).

      More importantly, while the fMRI measures are the same, group analysis over 30+ individuals is inherently different from comparing only 2 macaques (including smoothing and averaging away individual differences that might be more present in the monkeys, due to the much smaller sample size). 

      Thank you for understanding that a limited sampling size is intrinsic to macaque monkey physiology. We also agree that data analysis in humans and monkeys is necessarily different. As suggested by the reviewer, we added an analysis to address this, see the corresponding reply to the ‘Recommendations for the authors’ section below.

      Despite this, the results do appear to show that macaques show the predicted interaction effect (even despite the sample size), while humans do not. I think this is quite convincing, although had the results turned out differently (for example an effect in humans that was absent in macaques), I think this difference in sample size would be considerably more concerning. 

      Thank you for noting this. Indeed, the interaction effect is crucial, and the task design was explicitly made to test this precise prediction, described in our manuscript as the “reversibility hypothesis”. The congruity effect in the learned direction served as a control for learning, while the corresponding congruity effect in the reversed direction tested for spontaneous reversal. The reversibility hypothesis stipulates that in humans there should not be a difference between the learned and the reversed direction, while there should be for monkeys. We already wrote about that in the result section of the original manuscript and now also describe this more explicitly in the introduction and beginning of the result section.

      I would also note that while I agree with the authors' conclusions, it is notable to me that the congruity effect observed in humans (red vs blue lines in Fig. 2B) appears to be far more pronounced than any effect observed in the macaques (Fig. 3C-3). Again, this does not challenge the core finding of this paper but does suggest methodological or possibly motivational/attentional differences between the humans and the monkeys (or, for example, that the monkeys had learned the associations less strongly and clearly than the humans). 

      As also explained in response to the eLife assessment above, we expanded the “limitations” section of the discussion, with a deeper description of the possible methodological differences between the two species (see lines 478-485).

      With the same worry in mind, we did increase the attention and motivation of monkeys in experiment 2, and indeed obtained a greater activation to the canonical pairs and their violation, -notably in the prefrontal cortex – but crucially still without reversibility.

      In the end, we believe that the striking interspecies difference in size and extent of the violation effect, even for purely canonical stimuli, is an important part of our findings and points to a more efficient species-specific learning system, that our experiment tentatively relates to a symbolic competence.

      This is a strong paper with elegant methods and makes a worthwhile contribution to our understanding of the neural systems supporting symbolic representations in humans, as opposed to other animals. 

      We again thank the reviewer for the positive review.

      Reviewer #2 (Public Review): 

      In their article titled "Brain mechanisms of reversible symbolic reference: a potential singularity of the human brain", van Kerkoerle et al address the timely question of whether non-human primates (rhesus macaques) possess the ability for reverse symbolic inference as observed in humans. Through an fMRI experiment in both humans and monkeys, they analyzed the bold signal in both species while observing audio-visual and visual-visual stimuli pairs that had been previously learned in a particular direction. Remarkably, the findings pertaining to humans revealed that a broad brain network exhibited increased activity in response to surprises occurring in both the learned and reverse directions. Conversely, in monkeys, the study uncovered that the brain activity within sensory areas only responded to the learned direction but failed to exhibit any discernible response to the reverse direction. These compelling results indicate that the capacity for reversible symbolic inference may be unique to humans. 

      In general, the manuscript is skillfully crafted and highly accessible to readers. The experimental design exhibits originality, and the analyses are tailored to effectively address the central question at hand.

      Although the first experiment raised a number of methodological inquiries, the subsequent second experiment thoroughly addresses these concerns and effectively replicates the initial findings, thereby significantly strengthening the overall study. Overall, this article is already of high quality and brings new insight into human cognition. 

      We sincerely thank the reviewer for the positive comments. 

      I identified three weaknesses in the manuscript: 

      - One major issue in the study is the absence of significant results in monkeys. Indeed, authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison). 

      First, we disagree with the statement about “absence of significant results in monkeys”. We do report a significant interaction which, as noted by the referee, is a crucial positive finding.

      Second, we performed the suggested analysis for experiment 2, using the bilateral ROIs of the putative monkey MDN from previous literature (Mitchell, et al. 2016), which are based on the human study by Fedorenko et al. (PNAS, 2013). 

      Author response table 1.

      Congruity effect for monkeys in Experiment 2 within the ROIs of the MDN (n=3). Significance was assessed with one-sided one-sample t-tests.

      As can be seen, none of the regions within the monkey MDN showed an FDR-corrected significant difference or interaction. Although the absence of a canonical congruity effect makes it difficult to draw strong conclusions, it did approach significance at an uncorrected level in the lateral frontal posterior region, similar to  the large prefrontal effect we report in Figures 4 and 5. Furthermore, for the reversed congruity effect there was never even a trend at the uncorrected level, and the crucial interaction of canonicity and congruity again approached significance in the lateral prefrontal cortex.  

      We also performed an ANOVA  in the human participants of the VV experiment on the average betas across the 7 different fronto-parietal ROIs as used by Mitchell et al to define their equivalent to the monkey brain (Fig 1a, right in Mitchell et al. 2016) with congruity, canonicity and hemisphere (except for the anterior cingulate which is a bilateral ROI) as within-subject factors. We confirmed the results presented in the manuscript (Figure 4C) with notably no significant interaction between congruity and canonicity in any of these ROIs (all F-values (except insula) <1). A significant main effect of congruity was observed in the posterior middle frontal gyrus (MFG) and inferior precentral sulcus at the FDR corrected level. Analyses restricted to the canonical trials found a congruity effect in these two regions plus the anterior insula and anterior cingulate/presupplementary motor area, whereas no ROIs were significant at a FDR corrected level for reverse trials. There was a trend in the middle MFG and inferior precentral region for reversed trials. Crucially, there was not even a trend for the interaction between congruity and canonicity at the uncorrected level. The difference in the effect size between the canonical and reversed direction can therefore be explained by the larger statistical power due to the larger number of congruent trials (70%, versus 10% for the other trial conditions), not by a significant effect by the canonical and the reversed direction. 

      Author response table 2.

      Congruity effect for humans in Experiment 2 within the ROIs of the MDN (n=23).

      These results support our contention that the type of learning of the stimulus pairs was very different in the two species. We thank the reviewer for suggesting these relevant additional analyses.

      - While the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants. 

      We agree that this is an interesting question, although it is also very open-ended. For instance, we could report each subjects’ individual whole-brain results, but this would take too much space (and the interested reader will be able to do so from the data that we make available as part of this publication). As a step in this direction, we provide below a figure showing the individual congruity effects, separately for each experiment and for each ROI of table 5, and for each of the 52 participants for whom an fMRI localizer was available:

      Author response image 1.

      Difference in mean betas between congruent and incongruent conditions in a-priori linguistic and mathematical ROIs (see definition and analyses in Table 5) in both experiments (experiment 1 = AV, left panel; experiment 2= VV, right panel). Dots correspond to participants (red: canonical trials, green reversed trials).The boxplot notch is located at the median and the lower and upper box hinges at the 25th and 75th centiles. Whiskers extend to 1.5 inter-quartile ranges on either side of the hinges. ROIs are ranked by the median of the Incongruent-Congruent difference across canonical and reversed order,

      within a given experiment. For purposes of comparison between the two experiments, we have underlined with colors the top-five common ROIs between the two experiments. N.s.: non-significant congruity effect (p>0.05)

      Several regions show a rather consistent difference across subjects (see, for instance, the posterior STS in experiment 1, left panel). Overall, only 3 of the 52 participants did not show any beta superior to 2 in canonical or reversed in any ROIs. The consistency is quite striking, given the limited number of test trials (in total only 16 incongruent trials per direction per participant), and the fact that these ROIs were selected for their responses to spoken or written  sentences, as part of a subsidiary task quite different from the main task.

      - Some details are missing in the methods.  

      Thank you for these comments, we reply to them point-by-point below.

      Reviewer #3 (Public Review): 

      This study investigates the hypothesis that humans (but not non-human primates) spontaneously learn reversible temporal associations (i.e., learning a B-A association after only being exposed to A-B sequences), which the authors consider to be a foundational property of symbolic cognition. To do so, they expose humans and macaques to 2-item sequences (in a visual-auditory experiment, pairs of images and spoken nonwords, and in a visual-visual experiment, pairs of images and abstract geometric shapes) in a fixed temporal order, then measure the brain response during a test phase to congruent vs. incongruent pairs (relative to the trained associations) in canonical vs. reversed order (relative to the presentation order used in training). The advantage of neuroimaging for this question is that it removes the need for a behavioral test, which non-human primates can fail for reasons unrelated to the cognitive construct being investigated. In humans, the researchers find statistically indistinguishable incongruity effects in both directions (supporting a spontaneous reversible association), whereas in monkeys they only find incongruity effects in the canonical direction (supporting an association but a lack of spontaneous reversal). Although the precise pattern of activation varies by experiment type (visual-auditory vs. visual-visual) in both species, the authors point out that some of the regions involved are also those that are most anatomically different between humans and other primates. The authors interpret their finding to support the hypothesis that reversible associations, and by extension symbolic cognition, is uniquely human. 

      This study is a valuable complement to prior behavioral work on this question. However, I have some concerns about methods and framing. 

      We thank the reviewer for the careful summary of the manuscript, and the positive comments.

      Methods - Design issues: 

      The authors originally planned to use the same training/testing protocol for both species but the monkeys did not learn anything, so they dramatically increased the amount of training and evaluation. By my calculation from the methods section, humans were trained on 96 trials and tested on 176, whereas the monkeys got an additional 3,840 training trials and 1,408 testing trials. The authors are explicit that they continued training the monkeys until they got a congruity effect. On the one hand, it is commendable that they are honest about this in their write-up, given that this detail could easily be framed as deliberate after the fact. On the other hand, it is still a form of p-hacking, given that it's critical for their result that the monkeys learn the canonical association (otherwise, the critical comparison to the non-canonical association is meaningless). 

      Thank you for this comment. 

      Indeed, for experiment 1, the amount of training and testing was not equal for the humans and monkeys, as also mentioned by reviewer 2. We now describe in more detail how many training and imaging days we used for each experiment and each species, as well as the number of blocks per day and the number of trials per block (see lines 572-577). We also added the information on the amount of training receives to all of the legends of the Tables.

      We are sorry for giving the impression that we trained until the monkeys learned this. This was not the case. Based on previous literature, we actually anticipated that the short training would not be sufficient, and therefore planned additional training in advance. Specifically, Meyer & Olson (2011) had observed pair learning in the inferior temporal cortex of macaque monkeys after 816 exposures per pair. This is similar to the additional training we gave, about 80 blocks with 12 trials per pair per block. This is  now explained in more detail (lines 577-580).

      Furthermore, we strongly disagree with the pejorative term p-hacking. The aim of the experiment was not to show a congruency effect in the canonical direction in monkeys, but to track and compare their behavior in the same paradigm as that of humans for the reverse direction. It would have been unwise to stop after human-identical training and only show that humans learn better, which is a given. Instead, we looked at brain activations at both times, at the end of human-identical training and when the monkeys had learned the pairs in the canonical direction. 

      Finally, in experiment 2, monkeys were tested after the same 3 days of training as humans. We wrote: “Using this design, we obtained significant canonical congruity effects in monkeys on the first imaging day after the initial training (24 trials per pair), indicating that the animals had learned the associations” (lines 252-253).

      (2) Between-species comparisons are challenging. In addition to having differences in their DNA, human participants have spent many years living in a very different culture than that of NHPs, including years of formal education. As a result, attributing the observed differences to biology is challenging. One approach that has been adopted in some past studies is to examine either young children or adults from cultures that don't have formal educational structures. This is not the approach the authors take. This major confound needs to minimally be explicitly acknowledged up front. 

      Thank you for raising this important point. We already had a section on “limitations” in the manuscript, which we now extended (line 478-485). Indeed, this study is following a previous study in 5-month-old infants using EEG, in which we already showed that after learning associations between labels and categories, infants spontaneously generalize learning to the reversed pairs after a short learning period in the lab (Kabdebon et al, PNAS, 2019). We also cited preliminary results of the same paradigm as used in the current study but using EEG in 4-month-old infants (Ekramnia and Dehaene-Lambertz, 2019), where we replicated the results obtained by Kabdebon et al. 2019 showing that preverbal infants spontaneously generalize learning to the reversed pairs. 

      Functional MRI in awake infants remains a challenge at this age (but see our own work, DehaeneLambertz et al, Science, 2002), especially because the experimental design means only a few trials in the conditions of interest (10%) and thus a long experimental duration that exceed infants’ quietness and attentional capacities in the noisy MRI environment. (We discuss this on lines 493-496.)

      (3) Humans have big advantages in processing and discriminating spoken stimuli and associating them with visual stimuli (after all, this is what words are in spoken human languages). Experiment 2 ameliorates these concerns to some degree, but still, it is difficult to attribute the failure of NHPs to show reversible associations in Experiment 1 to cognitive differences rather than the relative importance of sound string to meaning associations in the human vs. NHP experiences. 

      As the reviewer wrote, we deliberately performed Experiment 2 with visual shapes to control for various factors that might have explained the monkeys' failure in Experiment 1. 

      (4) More minor: The localizer task (math sentences vs. other sentences) makes sense for math but seems to make less sense for language: why would a language region respond more to sentences that don't describe math vs. ones that do? 

      The referee is correct: our use of the word “reciprocally” was improper (although see Amalric et Dehaene, 2016 for significant differences in both directions when non-mathematical sentences concern specific knowledge). We changed the formulation to clarify this as follows: “In these ROIs, we recovered the subject-specific coordinates of each participant’s 10% best voxels in the following comparisons: sentences vs rest for the 6 language Rois ; reading vs listening for the VWFA ; and numerical vs non-numerical sentences for the 8 mathematical ROIs.” (lines 678-680).

      Methods - Analysis issues: 

      (5) The analyses appear to "double dip" by using the same data to define the clusters and to statistically test the average cluster activation (Kriegeskorte et al., 2009). The resulting effect sizes are therefore likely inflated, and the p-values are anticonservative. 

      It is not clear to us which result the reviewer is referring to. In Tables 1-4, we report the values that we found significant in the whole brain analysis, we do not report additional statistical tests for this data. For Table 5, the subject-specific voxels were identified through a separate localizer experiment, which was designed to pinpoint the precise activation areas for each subject in the domains of oral and written language-processing and math. Subsequently, we compared the activation at these voxel locations across different conditions of the main experiment. Thus, the two datasets were distinct, and there was no double dipping. In both interpretations of the comment, we therefore disagree with the reviewer.

      Framing: 

      (6) The framing ("Brain mechanisms of reversible symbolic reference: A potential singularity of the human brain") is bigger than the finding (monkeys don't spontaneously reverse a temporal association but humans do). The title and discussion are full of buzzy terms ("brain mechanisms", "symbolic", and "singularity") that are only connected to the experiments by a debatable chain of assumptions. 

      First, this study shows relatively little about brain "mechanisms" of reversible symbolic associations, which implies insights into how these associations are learned, recognized, and represented. But we're only given standard fMRI analyses that are quite inconsistent across similar experimental paradigms, with purely suggestive connections between these spatial patterns and prior work on comparative brain anatomy. 

      We agree with the referee that the term “mechanism” is ambiguous and, for systems neuroscientists, may suggest more than we are able to do here with functional MRI. We changed the title to “Brain areas for reversible symbolic reference, a potential singularity of the human brain”. This title better describes our specific contribution: mapping out the areas involved in reversibility in humans, and showing that they do not seem to respond similarly in macaque monkeys.

      Second, it's not clear what the relationship is between symbolic cognition and a propensity to spontaneously reverse a temporal association. Certainly, if there are inter-species differences in learning preferences this is important to know about, but why is this construed as a difference in the presence or absence of symbols? Because the associations aren't used in any downstream computation, there is not even any way for participants to know which is the sign and which is the signified: these are merely labels imposed by the researchers on a sequential task. 

      As explained in the introduction, the reversibility test addressed a very minimal core property of symbolic reference. There cannot be a symbol if its attachment doesn’t operate in both directions. Thus, this property is necessary – but we agree that it is not sufficient. Indeed, more tests are needed to establish whether and how the learned symbols are used in further downstream compositional tasks (as discussed in our recent TICS papers, Dehaene et al. 2022). We added a sentence in the introduction to acknowledge this fact:

      “Such reversibility is a core and necessary property of symbols, although we readily acknowledge that it is not sufficient, since genuine symbols present additional referential and compositional properties that will not be tested in the present work.” (lines 89-92).

      Third, the word "singularity" is both problematically ambiguous and not well supported by the results. "Singularity" is a highly loaded word that the authors are simply using to mean "that which is uniquely human". Rather than picking a term with diverse technical meanings across fields and then trying to restrict the definition, it would be better to use a different term. Furthermore, even under the stated definition, this study performed a single pairwise comparison between humans and one other species (macaques), so it is a stretch to then conclude (or insinuate) that the "singularity" has been found (see also pt. 2 above). 

      We have published an extensive review including a description of our use of the term “singularity” (Dehaene et al., TICS 2022). Here is a short except: “Humans are different even in domains such as drawing and geometry that do not involve communicative language. We refer to this observation using the term “human cognitive singularity”, the word singularity being used here in its standard meaning (the condition of being singular) as well as its mathematical sense (a point of sudden change). Hominization was certainly a singularity in biological evolution, so much so that it opened up a new geological age (the Anthropocene). Even if evolution works by small continuous change (and sometimes it doesn’t [4]), it led to a drastic cognitive change in humans.”

      We find the referee’s use of the pejorative term ”insinuate” quite inappropriate. From the title on, we are quite nuanced and refer only to a “potential singularity”. Furthermore, as noted above, we explicitly mention in the discussion the limitations of our study, and in particular the fact that only a single non-human species was tested (see lines 486-493). We are working hard to get chimpanzee data, but this is remarkably difficult for us, and we hope that our paper will incite other groups to collect more evidence on this point.

      (7) Related to pt. 6, there is circularity in the framing whereby the authors say they are setting out to find out what is uniquely human, hypothesizing that the uniquely human thing is symbols, and then selecting a defining trait of symbols (spontaneous reversible association) *because* it seems to be uniquely human (see e.g., "Several studies previously found behavioral evidence for a uniquely human ability to spontaneously reverse a learned association (Imai et al., 2021; Kojima, 1984; Lipkens et al., 1988; Medam et al., 2016; Sidman et al., 1982), and such reversibility was therefore proposed as a defining feature of symbol representation reference (Deacon, 1998; Kabdebon and DehaeneLambertz, 2019; Nieder, 2009).", line 335). They can't have it both ways. Either "symbol" is an independently motivated construct whose presence can be independently tested in humans and other species, or it is by fiat synonymous with the "singularity". This circularity can be broken by a more modest framing that focuses on the core research question (e.g., "What is uniquely human? One possibility is spontaneous reversal of temporal associations.") and then connects (speculatively) to the bigger conceptual landscape in the discussion ("Spontaneous reversal of temporal associations may be a core ability underlying the acquisition of mental symbols").

      We fail to understand the putative circularity that the referee sees in our introduction. We urge him/her to re-read it, and hope that, with the changes that we introduced, it does boil down to his/her summary, i.e. “What is uniquely human? One possibility is spontaneous reversal of temporal associations."

      Reviewer #1 (Recommendations For The Authors): 

      In general, the manuscript was very clear, easy to read, and compelling. I would recommend the authors carefully check the text for consistency and minor typos. For example: 

      The sample size for the monkeys kept changing throughout the paper. E.g., Experiment 1: n = 2 (line 149); n = 3 (line 205).  

      Thank you for catching this error, we corrected it. The number of animals was indeed 2  for experiment 1, and 3 for experiment 2. (Animals JD and YS participated in experiment 1 and JD, JC and DN in experiment 2. So only JD participated in both experiments.)

      Similarly, the number of stimulus pairs is reported inconsistently (4 on line 149, 5 pairs later in the paper). 

      We’re sorry that this was unclear. We used 5 sets of 4 audio-visual pairs each. We now clarify this, on line 157 and on lines 514-516.

      At least one case of p>0.0001, rather than p < 0.0001 (I assume). 

      Thank you once again, we now corrected this.

      Reviewer #2 (Recommendations For The Authors): 

      One major issue in the study is the absence of significant results in monkeys. Indeed, the authors draw conclusions regarding the lack of significant difference in activity related to surprise in the multidemand network (MDN) in the reverse congruent versus reverse incongruent conditions. Although the results are convincing (especially with the significant interaction between congruency and canonicity), the article could be improved by including additional analyses in a priori ROI for the MDN in monkeys (as well as in humans, for comparison). In other words: what are the statistics for the MDN regarding congruity, canonicity, and interaction in both species? Since the authors have already performed this type of analysis for language and Math ROIs (table 5), it should be relatively easy for them to extend it to the MDN. Demonstrating that results in monkeys are far from significant could further convince the reader. 

      Furthermore, while the authors acknowledge in the discussion that the number of monkeys included in the study is considerably lower compared to humans, it would be informative to know the variability of the results among human participants. Specifically, it would be valuable to describe the proportion of human participants in which the effects of congruency, canonicity, and their interaction are significant. Additionally, stating the variability of the F-values for each effect would provide reassurance to the reader regarding the distinctiveness of humans in comparison to monkeys. Low variability in the results would serve to mitigate concerns that the observed disparity is merely a consequence of testing a unique subset of monkeys, which may differ from the general population. Indeed, this would be a greater support to the notion that the dissimilarity stems from a genuine distinction between the two species. 

      We responded to both of these points above.

      In terms of methods, details are missing: 

      - How many trials of each condition are there exactly? (10% of 44 trials is 4.4) : 

      We wrote: “In both humans and monkeys, each block started with 4 trials in the learned direction (congruent canonical trials), one trial for each of the 4 pairs (2 O-L and 2 L-O pairs). The rest of the block consisted of 40 trials in which 70% of trials were identical to the training; 10% were incongruent pairs but the direction (O-L or L-O) was correct (incongruent canonical trials), thus testing whether the association was learned; 10% were congruent pairs but the direction within the pairs was reversed relative to the learned pairs (congruent reversed trials) and 10% were incongruent pairs in reverse (incongruent reversed trials).”(See lines 596-600.)

      Thus, each block comprised 4 initial trials, 28 canonical congruent trials, 4 canonical incongruent, 4 reverse congruent and 4 reverse incongruent trials, i.e. 4+28+3x4=40 trials.

      - How long is one trial? 

      As written in the method section: “In each trial, the first stimulus (label or object) was presented during 700ms, followed by an inter-stimulus-interval of 100ms then the second stimulus during 700ms. The pairs were separated by a variable inter-trial-interval of 3-5 seconds” i.e. 700+100+700=1500, plus 3 to 4.75 seconds of blank between the trials (see lines 531-533).

      - How are the stimulus presentations jittered? 

      See : “The pairs were separated by a variable inter-trial-interval randomly chosen among eight different durations between 3 and 4.75 seconds (step=250 ms). The series of 8 intervals was randomized again each time it was completed.”(lines 533-535).

      - What is the statistical power achieved for humans? And for monkeys? 

      We know of no standard way to define power for fMRI experiments. Power will depend on so many parameters, including the fMRI signal-to-noise ratio, the attention of the subject, the areas being considered, the type of analysis (whole-brain versus ROIs), etc.

      - Videos are mentioned in the methods, is it the image and sound? It is not clear. 

      We’re sorry that it was unclear. Video’s were only used for the training of the human subjects. We now corrected this in the method section (lines 552-554).

      Reviewer #3 (Recommendations For The Authors): 

      The main recommendations are to adjust the framing (making it less bold and more connected to the empirical evidence) and to ensure independence in the statistical analyses of the fMRI data. 

      See our replies to the reviewer’s comments on “Framing” above. In particular, we changed the title of the paper from “Brain mechanisms of reversible symbolic reference” to “Brain areas for reversible symbolic reference”.

      References cited in this response

      Dehaene, S., Al Roumi, F., Lakretz, Y., Planton, S., & Sablé-Meyer, M. (2022). Symbols and mental programs : A hypothesis about human singularity. Trends in Cognitive Sciences, 26(9), 751‑766. https://doi.org/10.1016/j.tics.2022.06.010.

      Dehaene-Lambertz, Ghislaine, Stanislas Dehaene, et Lucie Hertz-Pannier. Functional Neuroimaging of Speech Perception in Infants. Science 298, no 5600 (2002): 2013-15. https://doi.org/10.1126/science.1077066.

      Ekramnia M, Dehaene-Lambertz G. 2019. Investigating bidirectionality of associations in young infants as an approach to the symbolic system. Presented at the CogSci. p. 3449.

      Fedorenko E, Duncan J, Kanwisher N (2013) Broad domain generality in focal regions of frontal and parietal cortex. Proc Natl Acad Sci U S A 110:16616-16621.

      Kabdebon, Claire, et Ghislaine Dehaene-Lambertz. « Symbolic Labeling in 5-Month-Old Human Infants ». Proceedings of the National Academy of Sciences 116, no 12 (2019): 5805-10. https://doi.org/10.1073/pnas.1809144116.

      Mitchell, D. J., Bell, A. H., Buckley, M. J., Mitchell, A. S., Sallet, J., & Duncan, J. (2016). A Putative Multiple-Demand System in the Macaque Brain. Journal of Neuroscience, 36(33), 8574‑8585. https://doi.org/10.1523/JNEUROSCI.0810-16.2016

    1. Author response:

      We thank the reviewers for their thorough comments on our manuscript. We appreciate their recognition of the strengths in our study, including addressing the significant problem of neonatal sepsis in preterm infants using a preterm piglet model, the robustness of our multi-omics dataset, and our multi-pronged approach to examining the physiological changes under different glucose management regimens.

      This document addresses our initial responses to the main concerns of the 3 reviewers. We will provide more detailed responses to their comments and revise the manuscript at a later date.

      In response to Reviewer #1, we acknowledge the concern about high blood glucose levels in the control group. This work is a follow-up from our previous work (Muk et al, JCI insight 2022) where we explored different PN glucose regimens. Taken together, our experiments suggest a linear relationship between glucose provision and infection severity, indicating increased glucose may heighten mortality risk, while radical reduction could reduce mortality due to sepsis, but cause hypoglycemia and brain damage. As for the discrepancy in survival rates between Figures 1B and 6B, this is due to a shortened follow-up time in the follow-up experiment. This was done to minimize animal suffering because relevant differences in immune-responses were detectable within 12 hours in the primary experiment. As for the relationship between bacterial burdens and glucose, we agree that lower bacterial density in piglets receiving the reduced glucose PN may result from slower bacterial growth. However, we analyzed the relationship between bacterial burdens and mortality and found that it did not correlate within each of the treatment groups. This finding inspired us to further explore the relationship between bacterial burdens and infection responses in our model which has resulted in our recent preprint: Wu et at. Regulation of host metabolism and defense strategies to survive neonatal infection. BioRxiv 2024.02.23.581534; doi: https://doi.org/10.1101/2024.02.23.581534

      For Reviewer #2, The distinction between early (EOS) and late onset sepsis (LOS) in the time cut-off makes sense clinically because they are likely to be caused by different organisms and origins (EOS with maternal origin and LOS with postnatal origin) and therefore require different empirical antibiotics regimes. However, it is also important to acknowledge that the pathophysiology of “sepsis” may be similar despite timing and pathogen and depends on the degree of immune activation. Therefore, even though the infection in our model is initiated on the first day after birth the organism that we use, Staphylococcus epidermidis (most common bacteria detected in LOS), makes it a better model for LOS. As for neutrophil specific transcripts, we only collected the whole blood transcript during the experiments, which reflects the transcriptomic profile of all the leucocytes. Since we did not do single cell RNA sequencing during the experiment there is no possibility of isolating the neutrophil transcriptome at this time. As for the question of a “safe glucose infusion rate”, there likely is none as the immune responses to glucose intake do not seem binary but increase with glucose intake. Our reduced glucose PN was chosen as it corresponded with the low end of recommended guidelines for PN glucose intake. However, the reduced glucose intervention still resulted in significant morbidity and a 25% mortality within 22 hours. There is therefore still vast room for improvement, but even though further reduction in PN glucose intake would probably provide further protection, it would entail dangerous hypoglycemia. The findings in this paper have prompted us to explore several alternative strategies to both reduce infection-related mortality and maintain glucose homeostasis. Thus, the optimal PN for infected newborns would probably differ from standard PN in all macronutrients compartments and will require much more pre- and clinical research.

      For Reviewer #3, we acknowledge the variability in data collected from animals at euthanasia. These endpoints represent snapshots of the animals' states at euthanasia, which is a clear limitation of our method. Therefore, we do not know what metabolic processes precede the development of lethal sepsis, although the increases in plasma lactate suggest a higher rate of glycolysis in animals on high glucose PN. However, we believe the data still heavily imply a causal relationship between energy metabolic processes, especially glycolytic breakdown of glucose, and the pro-inflammatory responses leading to sepsis. In our recent preprint mentioned above we further explored the metabolic responses in pigs that succumbed to sepsis, compared to those that survived and found that survival was strongly associated with increases in mitochondrial metabolism and reduction in glycolysis.

      We hope these clarifications and our commitment to further research address your concerns satisfactorily. Thank you for your valuable feedback.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      “but an obvious influencing factor that the authors could investigate in their own data set is the retinal input. In Fig1b, the authors even show these data in the form of gaze and pupil size. In these example data, by eye, it looks like the pupil size is positively correlated with the run speed. This would of course have large consequences on the activity in V1, but the authors do not do anything with these data. The study would improve substantially if the authors would correlate their run speed traces with other factors that they have recorded too, such as pupil size and gaze.”

      Absolutely. We have added a first level of eye movement (and pupil size) analyses to the revised manuscript, resulting in an additional figure. In short, we found that eye movements are unlikely to play a significant role in our primary results, as the patterns of eye movements differed only slightly between running and stationary periods, and the measured impacts of such eye movements were also quantitatively much smaller than the primary effect sizes.

      We also note that in analyzing the eye movements, we also found that pupil size was larger during running than stationary. This is suggestive evidence that running is correlated with increases in arousal. Although more work will be needed to calibrate and quantify how much this factor affects neural responses (and perhaps to dissociate it from running per se), the simple analysis we present suggest that the large differences we observe could be explained by a difference between how arousal and running are correlated in the monkey versus the mouse. Instead, it appears that both species have at least qualitatively similar relations between pupil size (a standard proxy for arousal) and running.

      On this issue, we have added extensive discussion of the relevant recent work by Talluri et al. (2023) who attempted a similar cross-species analysis that considered spontaneous body movements and their effect on cortical activity (as well as the possibility that eye movements are a critical mediator in these modulations). Due to delays in revising our manuscript, we regret that our earlier submission had not cited this work, but we now do our best to highlight its importance and the synergy between these two papers. The full citation is listed below:

      Talluri BC, Kang I, Lazere A, Quinn KR, Kaliss N, Yates JL, Butts DA, Nienborg H. Activity in primate visual cortex is minimally driven by spontaneous movements. Nat Neurosci. 2023 Nov;26(11):1953-1959. doi: 10.1038/s41593-023-01459-5.

      There is a finer level of analysis that we hope to do in the future along these lines. It would rely on detailed characterization of each receptive field, building an image-computable model linking those receptive fields to the neural activity, and doing so at a finer time grain that links individual eye movements and changes in the spike train within a stimulus presentation (as opposed to working at the level of spike counts per stimulus presentation). Because these steps need to be accomplished together— and each requires substantial additional work and would go beyond the first-order findings we report in this work— we hope to report on such finer analyses in a standalone paper later. We are working on being able to do this in both marmoset and mouse.

      More generally, we want to emphatically agree that what is missing from this paper is the “why?”! We have done our best to show that a fair comparison reveals quantitatively different phenomena in marmoset and mouse. In the revised discussion, we lay out many lines of work that we hope will gain traction on this deeper mechanistic point. There’s a lot to do, and several of the possibilities are already current topics of exploration in our ongoing work.

      “Looking at the raster plot, however, shows that this strong positive correlation must be due entirely to the lower half of the neurons significantly increasing their firing rate as the mouse starts to run; in fact, the upper 25% or so of the neurons show exactly the opposite (strong suppression of the neurons as the mouse starts running). It would be more balanced if this heterogeneity in the response is at least mentioned somewhere in the text.”

      We are also intrigued by the heterogeneity of effects at the single neuron level. That is why the next section of the paper is dedicated to analyzing effects on a cell-by-cell basis. The fractions of neurons showing either increases or decreases are described separately, to get at this very issue.

      Reviewer 2 (Public Review)::

      “For example, it is known that the locomotion gain modulation varies with layer in the mouse visual cortex, with neurons in the infragranular layers expressing a diversity of modulations (Erisken et al. 2014 Current Biology). However, for the marmoset dataset, it was not reported from which cortical layer the neurons are from, leaving this point unanswered.”

      Reviewer 2 called for more consideration of details that have been addressed in the mouse literature, such as the cortical layer of the cells, and related aspects of circuitry. We have greatly re-worked the Discussion to address several of these issues. In short, the manuscript’s set of data were collected without strong traction on layers or cell types, and it will be quite interesting to get a better handle on this using both refinements to our recording procedures as well as new techniques that are now possible in the marmoset for future studies.

      “In this regard, it is worth noting that the authors report an interesting difference between the foveal and peripheral parts of the visual cortex in marmoset. It will be interesting to investigate these differences in more detail in future studies. Likewise, while running might be an important behavioral state for mice, other behavioral states might be more relevant for marmosets and do modulate the activity of the primate visual cortex more profoundly. Future work could leverage the opportunities that the marmoset model system offers to reveal new insights about behavioral-related modulation in the primate brain.”

      Same page! We have expanded the discussion to better emphasize these points and are already deep in follow up experiments to explore the foveal and peripheral representations.

      Reviewer 3 (Public Review)::

      “However, the authors did not take full advantage of the quantity and diversity of the marmoset visual cortex recordings in their analyses. They mention recording and analyzing the activity of peripheral V1 neurons but mainly present results involving foveal V1 neurons. Foveal neurons, with their small receptive fields strongly affected by precise eye position, would seem to be less likely to be comparable to rodent data. If the authors have a reason for not doing so, they should provide an explanation.”

      We agree, and hope the reviewer finds our overall reply, detailed response to Reviewer 1 (who raised a similar issue), and corresponding updates to the manuscript appropriate for this stage of understanding.

      “Given that the marmosets are motivated to run with liquid rewards, the authors should provide more context as to how this may or may not affect marmoset V1 activity. Additionally, the lack of consideration of eye movements or position presents a major absence for the marmoset results, and fails to take advantage of one of the key differences between primate and rodent visual systems - the marmosets have a fovea, and make eye movements that fixate in various locations on the screen during the task.”

      In addition to the response above, we have made edits to the manuscript to speak to issues of arousal and eye movements (also detailed in previous responses). Given the modest decrease in activity we see, the usual concerns about potential increases in neural activity related to eye movements (which we quantify in the revision) and other issues related to motivation are hard to specifically relate to existing literature. But in the revised Discussion we talk more about how future work can/should dissociate these factors, as has been done in the mouse literature.

      “Finally, the model provides a strong basis for comparison at the level of neuronal populations, but some methodological choices are insufficiently described and may have an impact on interpreting the claims.”

      We have also clarified the shared-gain model’s description, which we agree needed additional detail and clarification.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The current study provided a follow-up analysis using published datasets focused on the individual variability of both the distraction effect (size and direction) and the attribute integration style, as well as the association between the two. The authors tried to answer the question of whether the multiplicative attribute integration style concurs with a more pronounced and positively oriented distraction effect.

      Strengths:

      The analysis extensively examined the impacts of various factors on decision accuracy, with a particular focus on using two-option trials as control trials, following the approach established by Cao & Tsetsos (2022). The statistical significance results were clearly reported.

      The authors meticulously conducted supplementary examinations, incorporating the additional term HV+LV into GLM3. Furthermore, they replaced the utility function from the expected value model with values from the composite model.

      We thank the reviewer for the positive response and are pleased that the reviewer found our report interesting.

      Reviewer #1 Comment 1

      Weaknesses:

      There are several weaknesses in terms of theoretical arguments and statistical analyses.

      First, the manuscript suggests in the abstract and at the beginning of the introduction that the study reconciled the "different claims" about "whether distraction effect operates at the level of options' component attributes rather than at the level of their overall value" (see line 13-14), but the analysis conducted was not for that purpose. Integrating choice attributes in either an additive or multiplicative way only reflects individual differences in combining attributes into the overall value. The authors seemed to assume that the multiplicative way generated the overall value ("Individuals who tended to use a multiplicative approach, and hence focused on overall value", line 20-21), but such implicit assumption is at odds with the statement in line 77-79 that people may use a simpler additive rule to combine attributes, which means overall value can come from the additive rule.

      We thank the reviewer for the comment. We have made adjustments to the manuscript to ensure that the message delivered within this manuscript is consistent. Within this manuscript, our primary focus is on the different methods of value integration in which the overall value is computed (i.e., additive, multiplicative, or both), rather than the interaction at the individual level of attributes. However, we do not exclude the possibility that the distractor effect may occur at multiple levels. Nevertheless, in light of the reviewer’s comment, we agree that we should focus the argument on whether distractors facilitate or impair decision making and downplay the separate argument about the level at which distractor effects operate. We have now revised the abstract:

      “It is widely agreed that people make irrational decisions in the presence of irrelevant distractor options. However, there is little consensus on whether decision making is facilitated or impaired by the presence of a highly rewarding distractor or whether the distraction effect operates at the level of options’ component attributes rather than at the level of their overall value. To reconcile different claims, we argue that it is important to incorporate consideration of the diversity of people’s ways of decision making. We focus on a recent debate over whether people combine choice attributes in an additive or multiplicative way. Employing a multi-laboratory dataset investigating the same decision making paradigm, we demonstrated that people used a mix of both approaches and the extent to which approach was used varied across individuals. Critically, we identified that this variability was correlated with the effect of the distractor on decision making. Individuals who tended to use a multiplicative approach to compute value, showed a positive distractor effect. In contrast, in individuals who tended to use an additive approach, a negative distractor effect (divisive normalisation) was prominent. These findings suggest that the distractor effect is related to how value is constructed, which in turn may be influenced by task and subject specificities. Our work concurs with recent behavioural and neuroscience findings that multiple distractor effects co-exist.” (Lines 12-26)

      Furthermore, we acknowledge that the current description of the additive rule could be interpreted in several ways. The current additive utility model described as:

      where  is the options’ utility,  is the reward magnitude,  is the probability, and  is the magnitude/probability weighing ratio . If we perform comparison between values according to this model (i.e., HV against LV), we would arrive at the following comparison:

      If we rearrange (1), we will arrive at:

      While equations (1) and (2) are mathematically equivalent, equation (1) illustrates the interpretation where the comparison of the utilities occurs after value integration and forming an overall value. On the other hand, equation (2) can be broadly interpreted as the comparison of individual attributes in the absence of an overall value estimate for each option. Nonetheless, while we do not exclude the possibility that the distractor effect may occur at multiple levels, we have made modifications to the main manuscript employ more consistently a terminology referring to different methods of value estimation while recognizing that our empirical results are compatible with both interpretations.

      Reviewer #1 Comment 2

      The second weakness is sort of related but is more about the lack of coherent conceptual understanding of the "additive rule", or "distractor effect operates at the attribute level". In an assertive tone (lines 77-80), the manuscript suggests that a weighted sum integration procedure of implementing an "additive rule" is equal to assuming that people compare pairs of attributes separately, without integration. But they are mechanistically distinct. The additive rule (implemented using the weighted sum rule to combine probability and magnitude within each option and then applying the softmax function) assumes value exists before comparing options. In contrast, if people compare pairs of attributes separately, preference forms based on the within-attribute comparisons. Mathematically these two might be equivalent only if no extra mechanisms (such as inhibition, fluctuating attention, evidence accumulation, etc) are included in the within-attribute comparison process, which is hardly true in the three-option decision.

      We thank the reviewer for the comment. As described in our response to Reviewer #1 Comment 1, we are aware and acknowledge that there may be multiple possible interpretations of the additive rule. We also agree with the reviewer that there may be additional mechanisms that are involved in three- or even two- option decisions, but these would require additional studies to tease apart. Another motivation for the approach used here, which does not explicitly model the extra mechanisms the reviewer refers to was due to the intention of addressing and integrating findings from previous studies using the same dataset [i.e. (Cao & Tsetsos, 2022; Chau et al., 2020)]. Lastly, regardless of the mechanistic interpretation, our results show a systematic difference in the process of value estimation. Modifications to the manuscript text have been made consistent with our motivation (please refer to the reply and the textual changes proposed in response to the reviewer’s previous comment: Reviewer #1 Comment 1).

      Reviewer #1 Comment 3

      Could the authors comment on the generalizability of the current result? The reward magnitude and probability information are displayed using rectangular bars of different colors and orientations. Would that bias subjects to choose an additive rule instead of the multiplicative rule? Also, could the conclusion be extended to other decision contexts such as quality and price, whether a multiplicative rule is hard to formulate?

      We thank the reviewer for the comment. We agree with the observation that the stimulus space, with colour linearly correlated with magnitude, and orientation linearly correlated with probability, may bias subjects towards an additive rule. But that’s indeed the point: in order to maximise reward, subjects should have focused on the outcome space without being driven by the stimulus space. In practice, people are more or less successful in such endeavour. Nevertheless, we argue that the specific choice of visual stimuli we used is no more biased towards additive space than any other. In fact, as long as two or more pieces of information are provided for each option, as opposed to a single cue whose value was previously learned, there will always be a bias towards an additive heuristic (a linear combination), regardless of whether the cues are shapes, colours, graphs, numbers, words.

      As the reviewer suggested, the dataset analyzed in the current manuscript suggests that the participants were leaning towards the additive rule. Although there was a general tendency using the additive rule while choosing between the rectangular bars, we can still observe a spread of individuals using either, or both, additive and multiplicative rules, suggesting that there was indeed diversity in participants’ decision making strategies in our data.

      In previous studies, it was observed that human and non-human individuals used a mix of multiplicative and additive rules when they were tested on experimental paradigms different from ours (Bongioanni et al., 2021; Farashahi et al., 2019; Scholl et al., 2014). It was also observed that positive and negative distractor effects can be both present in the same data set when human and non-human individuals made decisions about food and social partner (Chang et al., 2019; Louie et al., 2013). It was less clear in the past whether the precise way a distractor affects decision making (i.e., positive/negative distractor effect) is related to the use of decision strategy (i.e., multiplicative/additive rules) and this is exactly what we are trying to address in this manuscript. A follow-up study looking at neural data (such as functional magnetic resonance imaging data) could provide a better understanding of the mechanistic nature of the relationship between distractor effects and decision strategy that we identified here.

      We agree with the reviewer that it is true that a multiplicative strategy may not be applicable to some decision contexts. Here it is important to look at the structure of the optimal solution (the one maximizing value in the long run). Factors modulating value (such as probability and temporal delay) require a non-linear (e.g., multiplicative solution), while factors of the cost-benefit form (such as effort and price) require a linear solution (e.g., subtraction). In the latter scenario the additive heuristic would coincide with the optimal solution, and the effect addressed in this study may not be revealed. Nonetheless, the present data supports the notion of distinct neural mechanisms at least for probabilistic decision-making, and is likely applicable to decision-making in general.

      Our findings, in conjunction with the literature, also suggest that a positive distractor effect could be a general phenomenon in decision mechanisms that involve the medial prefrontal cortex. For example, it has been shown that the positive distractor effect is related to a decision mechanism linked to medial prefrontal cortex [especially the ventromedial prefrontal cortex (Chau et al., 2014; Noonan et al., 2017)]. It is also known a similar brain region is involved not only when individuals are combining information using a multiplicative strategy (Bongioanni et al., 2021), but also when they are combining information to evaluate new experience or generalize information (Baram et al., 2021; Barron et al., 2013; Park et al., 2021). We have now revised the Discussion to explain this:

      “In contrast, the positive distractor effect is mediated by the mPFC (Chau et al., 2014; Fouragnan et al., 2019). Interestingly, the same or adjacent, interconnected mPFC regions have also been linked to the mechanisms by which representational elements are integrated into new representations (Barron et al., 2013; Klein-Flügge et al., 2022; Law et al., 2023; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). In a number of situations, such as multi-attribute decision making, understanding social relations, and abstract knowledge, the mPFC achieves this by using a spatial map representation characterised by a grid-like response (Constantinescu et al., 2016; Bongioanni et al., 2021; Park et al., 2021) and disrupting mPFC leads to the evaluation of composite choice options as linear functions of their components (Bongioanni et al., 2021). These observations suggest a potential link between positive distractor effects and mechanisms for evaluating multiple component options and this is consistent with the across-participant correlation that we observed between the strength of the positive distractor effect and the strength of non-additive (i.e., multiplicative) evaluation of the composite stimuli we used in the current task. Hence, one direction for model development may involve incorporating the ideas that people vary in their ways of combining choice attributes and each way is susceptible to different types of distractor effect.” (Lines 260-274)

      Reviewer #1 Comment 4

      The authors did careful analyses on quantifying the "distractor effect". While I fully agree that it is important to use the matched two-option trials and examine the interaction terms (DV-HV)T as a control, the interpretation of the results becomes tricky when looking at the effects in each trial type. Figure 2c shows a positive DV-HV effect in two-option trials whereas the DV-HV effect was not significantly stronger in three-option trials. Further in Figure 5b,c, in the Multiplicative group, the effect of DV-HV was absent in the two-option trials and present in the three-option trials. In the Additive group, however, the effect of DV-HV was significantly positive in the two-option trials but was significantly lowered in the three-option trials. Hence, it seems the different distractor effects were driven by the different effects of DV-HV in the two-option trials, rather than the three-option trials?

      We thank the reviewer for the comment. While it may be a bit more difficult to interpret, the current method of examining the (DV−HV)T term rather than (DV−HV) term was used because it was the approach used in a previous study (Cao & Tsetsos, 2022).

      During the design of the original experiments, trials were generated pseudo-randomly until the DV was sufficiently decorrelated from HV−LV. While this method allows for better group-level examination of behaviour, Cao and Tsetsos were concerned that this approach may have introduced unintended confounding covariations to some trials. In theory, one of the unintended covariations could occur between the DV and specific sets of reward magnitude and probability of the HV and LV. The covariation between parameters can lead to an observable positive distractor effect in the DV−HV as a consequence of the attraction effect or an unintended byproduct of using an additive method of integrating attributes [for further elaboration, please refer to Figure 1 in (Cao & Tsetsos, 2022)]. While it may have some limitations, the approach suggested by Cao and Tsetsos has the advantage of leveraging the DV−HV term to absorb any variance contributed by possible confounding factors such that true distractor effects, if any, can be detected using the (DV−HV)T term.

      Reviewer #1 Comment 5

      Note that the pattern described above was different in Supplementary Figure 2, where the effect of DV-HV on the two-option trials was negative for both Multiplicative and Additive groups. I would suggest considering using Supplementary Figure 2 as the main result instead of Figure 5, as it does not rely on multiplicative EV to measure the distraction effect, and it shows the same direction of DV-HV effect on two-option trials, providing a better basis to interpret the (DV-HV)T effect.

      We thank the reviewer for the comments and suggestion. However, as mentioned in the response to Reviewer #1 Comment 4, the current method of analysis adopted in the manuscript and the interpretation of only (DV−HV)T is aimed to address the possibility that the (DV−HV) term may be capturing some confounding effects due to covariation. Given that the debate that is addressed specifically concerns the (DV−HV)T term, we elected to display Figure 5 within the main text and keep the results of the regression after replacing the utility function with the composite model as Supplementary Figure 5 (previously labelled as Supplementary Figure 2).

      Reviewer #2 (Public Review):

      This paper addresses the empirical demonstration of "distractor effects" in multi-attribute decision-making. It continues a debate in the literature on the presence (or not) of these effects, which domains they arise in, and their heterogeneity across subjects. The domain of the study is a particular type of multi-attribute decision-making: choices over risky lotteries. The paper reports a re-analysis of lottery data from multiple experiments run previously by the authors and other laboratories involved in the debate.

      Methodologically, the analysis assumes a number of simple forms for how attributes are aggregated (adaptively, multiplicatively, or both) and then applies a "reduced form" logistic regression to the choices with a number of interaction terms intended to control for various features of the choice set. One of these interactions, modulated by ternary/binary treatment, is interpreted as a "distractor effect."

      The claimed contribution of the re-analysis is to demonstrate a correlation in the strength/sign of this treatment effect with another estimated parameter: the relative mixture of additive/multiplicative preferences.

      We thank the reviewer for the positive response and are pleased that the reviewer found our report interesting.

      Reviewer #2 Comment 1

      Major Issues

      (1) How to Interpret GLM 1 and 2

      This paper, and others before it, have used a binary logistic regression with a number of interaction terms to attempt to control for various features of the choice set and how they influence choice. It is important to recognize that this modelling approach is not derived from a theoretical claim about the form of the computational model that guides decision-making in this task, nor an explicit test for a distractor effect. This can be seen most clearly in the equations after line 321 and its corresponding log-likelihood after 354, which contain no parameter or test for "distractor effects". Rather the computational model assumes a binary choice probability and then shoehorns the test for distractor effects via a binary/ternary treatment interaction in a separate regression (GLM 1 and 2). This approach has already led to multiple misinterpretations in the literature (see Cao & Tsetsos, 2022; Webb et al., 2020). One of these misinterpretations occurred in the datasets the authors studied, in which the lottery stimuli contained a confound with the interaction that Chau et al., (2014) were interpreting as a distractor effect (GLM 1). Cao & Tsetsos (2022) demonstrated that the interaction was significant in binary choice data from the study, therefore it can not be caused by a third alternative. This paper attempts to address this issue with a further interaction with the binary/ternary treatment (GLM 2). Therefore the difference in the interaction across the two conditions is claimed to now be the distractor effect. The validity of this claim brings us to what exactly is meant by a "distractor effect."

      The paper begins by noting that "Rationally, choices ought to be unaffected by distractors" (line 33). This is not true. There are many normative models that allow for the value of alternatives (even low-valued "distractors") to influence choices, including a simple random utility model. Since Luce (1959), it has been known that the axiom of "Independence of Irrelevant Alternatives" (that the probability ratio between any two alternatives does not depend on a third) is an extremely strong axiom, and only a sufficiency axiom for a random utility representation (Block and Marschak, 1959). It is not a necessary condition of a utility representation, and if this is our definition of rational (which is highly debatable), not necessary for it either. Countless empirical studies have demonstrated that IIA is falsified, and a large number of models can address it, including a simple random utility model with independent normal errors (i.e. a multivariate Probit model). In fact, it is only the multinomial Logit model that imposes IIA. It is also why so much attention is paid to the asymmetric dominance effect, which is a violation of a necessary condition for random utility (the Regularity axiom).

      So what do the authors even mean by a "distractor effect." It is true that the form of IIA violations (i.e. their path through the probability simplex as the low-option varies) tells us something about the computational model underlying choice (after all, different models will predict different patterns). However we do not know how the interaction terms in the binary logit regression relate to the pattern of the violations because there is no formal theory that relates them. Any test for relative value coding is a joint test of the computational model and the form of the stochastic component (Webb et al, 2020). These interaction terms may simply be picking up substitution patterns that can be easily reconciled with some form of random utility. While we can not check all forms of random utility in these datasets (because the class of such models is large), this paper doesn't even rule any of these models out.

      We thank the reviewer for the comment. In this study, one objective is to address an issue raised by Cao and Tsetsos (2022), suggesting that the distractor effect claimed in the Chau et al. (2014) study was potentially confounded by unintended correlation introduced between the distractor and the chooseable options. They suggested that this could be tested by analyzing the control binary trials and the experimental ternary trials in a single model (i.e., GLM2) and introducing an interaction term (DV−HV)T. The interaction term can partial out any unintended confound and test the distractor effect that was present specifically in the experimental ternary trials. We adopted these procedures in our current studies and employed the interaction term to test the distractor effects. The results showed that overall there was no significant distractor effect in the group. We agree with the reviewer’s comment that if we were only analysing the ternary trials, a multinomial probit model would be suitable because it allows noise correlation between the choices. Alternatively, had a multinomial logistic model been applied, a Hausman-McFadden Test could be run to test whether the data violates the assumption of independence of irrelevant alternatives (IIA). However, in our case, a binomial model is preferred over a multinomial model because of: (1) the inclusion of the binary trials, and (2) the small number of trials in which the distractor was chosen (the median was 4% of all ternary trials).

      However, another main objective of this study is to consider the possibility that the precise distractor effect may vary across individuals. This is exactly why we employed the composite model to estimate individual’s decision making strategy and investigated how that varied with the precise way the distractor influenced decision making.

      In addition, we think that the reviewer here is raising a profound point and one with which we are in sympathy; it is true that random noise utility models can predict deviations from the IIA axiom. Central to these approaches is the notion that the representations of the values of choice options are noisy. Thus, when the representation is accessed, it might have a certain value on average but this value might vary from occasion to occasion as if each sample were being drawn from a distribution. As a consequence, the value of a distractor that is “drawn” during a decision between two other options may be larger than the distractor’s average value and may even have a value that is larger than the value drawn from the less valuable choice option’s distribution on the current trial. On such a trial it may become especially clear that the better of the two options has a higher value than the alternative choice option. Our understanding is that Webb, Louie and colleagues (Louie et al., 2013; Webb et al., 2020) suggest an explanation approximately along these lines when they reported a negative distractor effect during some decisions, i.e., they follow the predictions of divisive normalization suggesting that decisions become more random as the distractor’s value is greater.

      An alternative approach, however, assumes that rather than noise in the representation of the option itself, there is noise in the comparison process when the two options are compared. This is exemplified in many influential decision making models including evidence accumulation models such as drift diffusion models (Shadlen & Shohamy, 2016) and recurrent neural network models of decision making (Wang, 2008). It is this latter type of model that we have used in our previous investigations (Chau et al., 2020; Kohl et al., 2023). However, these two approaches are linked both in their theoretical origin and in the predictions that they make in many situations (Shadlen & Shohamy, 2016). We therefore clarify that this is the case in the revised manuscript as follows:

      “In the current study and in previous work we have used or made reference to models of decision making that assume that a noisy process of choice comparison occurs such as recurrent neural networks and drift diffusion models (Shadlen & Shohamy, 2016; Wang, 2008). Under this approach, positive distractor effects are predicted when the comparison process becomes more accurate because of an impact on the noisy process of choice comparison (Chau et al., 2020; Kohl et al., 2023). However, it is worth noting that another class of models might assume that a choice representation itself is inherently noisy. According to this approach, on any given decision a sample is drawn from a distribution of value estimates in a noisy representation of the option. Thus, when the representation is accessed, it might have a certain value on average but this value might vary from occasion to occasion. As a consequence, the value of a distractor that is “drawn” during decision between two other options may be larger than the distractor’s average value and may even have a value that is larger than the value drawn from the less valuable choice option’s distribution on the current trial. On such a trial it may become especially clear that the better of the two options has a higher value than the alternative choice option. Louie and colleagues (Louie et al., 2013) suggest an explanation approximately along these lines when they reported a positive distractor effect during some decisions. Such different approaches share theoretical origins (Shadlen & Shohamy, 2016) and make related predictions about the impact of distractors on decision making.” (Lines 297-313)

      Reviewer #2 Comment 2

      (2) How to Interpret the Composite (Mixture) model?

      On the other side of the correlation are the results from the mixture model for how decision-makers aggregate attributes. The authors report that most subjects are best represented by a mixture of additive and multiplicative aggregation models. The authors justify this with the proposal that these values are computed in different brain regions and then aggregated (which is reasonable, though raises the question of "where" if not the mPFC). However, an equally reasonable interpretation is that the improved fit of the mixture model simply reflects a misspecification of two extreme aggregation processes (additive and EV), so the log-likelihood is maximized at some point in between them.

      One possibility is a model with utility curvature. How much of this result is just due to curvature in valuation? There are many reasonable theories for why we should expect curvature in utility for human subjects (for example, limited perception: Robson, 2001, Khaw, Li Woodford, 2019; Netzer et al., 2022) and of course many empirical demonstrations of risk aversion for small stakes lotteries. The mixture model, on the other hand, has parametric flexibility.

      There is also a large literature on testing expected utility jointly with stochastic choice, and the impact of these assumptions on parameter interpretation (Loomes & Sugden, 1998; Apesteguia & Ballester, 2018; Webb, 2019). This relates back to the point above: the mixture may reflect the joint assumption of how choice departs from deterministic EV.

      We thank the reviewer for the comment. They are indeed right to mention the vast literature on curvature in subjective valuation; however it is important to stress that the predictions of the additive model with linear basis functions are quite distinct for the predictions of a multiplicative model with non-linear basis functions. We have tested the possibility that participants’ behaviour was better explained by the latter and we showed that this was not the case. Specifically, we have added and performed model fitting on an additional model with utility curvature based on prospect theory (Kahneman & Tversky, 1979) with the weighted probability function suggested by (Prelec, 1998):

      where  and  represent the reward magnitude and probability (both rescaled to the interval between 0 and 1), respectively.  is the weighted magnitude and  is the weighted probability, while  and  are the corresponding distortion parameters. This prospect theory (PT) model is included along with the four previous models (please refer to Figure 3) in a Bayesian model comparison. Results indicate that the composite model remains the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720). We have now included these results in the main text and Supplementary Figure 2:

      “Supplementary Figure 2 reports an additional Bayesian model comparison performed while including a model with nonlinear utility functions based on Prospect Theory (Kahneman & Tversky, 1979) with the Prelec formula for probability (Prelec, 1998). Consistent with the above finding, the composite model provides the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720).” (Lines 193-198)

      Reviewer #2 Comment 3

      3) So then how should we interpret the correlation that the authors report?

      On one side we have the impact of the binary/ternary treatment which demonstrates some impact of the low value alternative on a binary choice probability. This may reflect some deep flaws in existing theories of choice, or it may simply reflect some departure from purely deterministic expected value maximization that existing theories can address. We have no theory to connect it to, so we cannot tell. On the other side of the correlation, we have a mixture between additive and multiplicative preferences over risk. This result may reflect two distinct neural processes at work, or it may simply reflect a misspecification of the manner in which humans perceive and aggregate attributes of a lottery (or even just the stimuli in this experiment) by these two extreme candidates (additive vs. EV). Again, this would entail some departure from purely deterministic expected value maximization that existing theories can address.

      It is entirely possible that the authors are reporting a result that points to the more exciting of these two possibilities. But it is also possible (and perhaps more likely) that the correlation is more mundane. The paper does not guide us to theories that predict such a correlation, nor reject any existing ones. In my opinion, we should be striving for theoretically-driven analyses of datasets, where the interpretation of results is clearer.

      We thank the reviewer for their clear comments. Based on our responses to the previous comments it should be apparent that our results are consistent with several existing theories of choice, so we are not claiming that there are deep flaws in them, but distinct neural processes (additive and multiplicative) are revealed, and this does not reflect a misspecification in the modelling. We have revised our manuscript in the light of the reviewer’s comments in the hope of clarifying the theoretical background which informed both our data analysis and our data interpretation.

      First, we note that there are theoretical reasons to expect a third option might impact on choice valuation. There is a large body of work suggesting that a third option may have an impact on the values of two other options (indeed Reviewer #2 refers to some of this work in their Reviewer #2 Comment 1), but the body of theoretical work originates partly in neuroscience and not just in behavioural economics. In many sensory systems, neural activity changes with the intensity of the stimuli that are sensed. Divisive normalization in sensory systems, however, describes the way in which such neural responses are altered also as a function of other adjacent stimuli (Carandini & Heeger, 2012; Glimcher, 2022; Louie et al., 2011, 2013). The phenomenon has been observed at neural and behavioural levels as a function not just of the physical intensity of the other stimuli but as a function of their associated value (Glimcher, 2014, 2022; Louie et al., 2011, 2015; Noonan et al., 2017; Webb et al., 2020).

      Analogously there is an emerging body of work on the combinatorial processes that describe how multiple representational elements are integrated into new representations (Barron et al., 2013; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). These studies have originated in neuroscience, just as was the case with divisive normalization, but they may have implications for understanding behaviour. For example, they might be linked to behavioural observations that the values assigned to bundles of goods are not necessarily the sum of the values of the individual goods (Hsee, 1998; List, 2002). One neuroscience fact that we know about such processes is that, at an anatomical level, they are linked to the medial frontal cortex (Barron et al., 2013; Fellows, 2006; Hunt et al., 2012; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). A second neuroscientific fact that we know about medial frontal cortex is that it is linked to any positive effects that distractors might have on decision making (Chau et al., 2014; Noonan et al., 2017). Therefore, we might make use of these neuroscientific facts and theories to predict a correlation between positive distractor effects and non-additive mechanisms for determining the integrated value of multi-component choices. This is precisely what we did; we predicted the correlation on the basis of this body of work and when we tested to see if it was present, we found that indeed it was. It may be the case that other behavioural economics theories offer little explanation of the associations and correlations that we find. However, we emphasize that this association is predicted by neuroscientific theory and in the revised manuscript we have attempted to clarify this in the Introduction and Discussion sections:

      “Given the overlap in neuroanatomical bases underlying the different methods of value estimation and the types of distractor effects, we further explored the relationship. Critically, those who employed a more multiplicative style of integrating choice attributes also showed stronger positive distractor effects, whereas those who employed a more additive style showed negative distractor effects. These findings concur with neural data demonstrating that the medial prefrontal cortex (mPFC) computes the overall values of choices in ways that go beyond simply adding their components together, and is the neural site at which positive distractor effects emerge (Barron et al., 2013; Bongioanni et al., 2021; Chau et al., 2014; Fouragnan et al., 2019; Noonan et al., 2017; Papageorgiou et al., 2017), while divisive normalization was previously identified in the posterior parietal cortex (PPC) (Chau et al., 2014; Louie et al., 2011).” (Lines 109-119)

      “At the neuroanatomical level, the negative distractor effect is mediated by the PPC, where signal modulation described by divisive normalization has been previously identified (Chau et al., 2014; Louie et al., 2011). The same region is also crucial for perceptual decision making processes (Shadlen & Shohamy, 2016). The additive heuristics for combining choice attributes are closer to a perceptual evaluation because distances in this subjective value space correspond linearly to differences in physical attributes of the stimuli, whereas normative (multiplicative) value has a non-linear relation with them (cf. Figure 1c). It is well understood that many sensory mechanisms, such as in primates’ visual systems or fruit flies’ olfactory systems, are subject to divisive normalization (Carandini & Heeger, 2012). Hence, the additive heuristics that are more closely based on sensory mechanisms could also be subject to divisive normalization, leading to negative distractor effects in decision making.

      In contrast, the positive distractor effect is mediated by the mPFC (Chau et al., 2014; Fouragnan et al., 2019). Interestingly, the same or adjacent, interconnected mPFC regions have also been linked to the mechanisms by which representational elements are integrated into new representations (Barron et al., 2013; Klein-Flügge et al., 2022; Law et al., 2023; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). In a number of situations, such as multi-attribute decision making, understanding social relations, and abstract knowledge, the mPFC achieves this by using a spatial map representation characterised by a grid-like response (Constantinescu et al., 2016; Bongioanni et al., 2021; Park et al., 2021) and disrupting mPFC leads to the evaluation of composite choice options as linear functions of their components (Bongioanni et al., 2021). These observations suggest a potential link between positive distractor effects and mechanisms for evaluating multiple component options and this is consistent with the across-participant correlation that we observed between the strength of the positive distractor effect and the strength of non-additive (i.e., multiplicative) evaluation of the composite stimuli we used in the current task. Hence, one direction for model development may involve incorporating the ideas that people vary in their ways of combining choice attributes and each way is susceptible to different types of distractor effect.” (Lines 250-274)

      Reviewer #2 Comment 4

      (4) Finally, the results from these experiments might not have external validity for two reasons. First, the normative criterion for multi-attribute decision-making differs depending on whether the attributes are lotteries or not (i.e. multiplicative vs additive). Whether it does so for humans is a matter of debate. Therefore if the result is unique to lotteries, it might not be robust for multi-attribute choice more generally. The paper largely glosses over this difference and mixes literature from both domains. Second, the lottery information was presented visually and there is literature suggesting this form of presentation might differ from numerical attributes. Which is more ecologically valid is also a matter of debate.

      We thank the reviewer for the comment. Indeed, they are right that the correlation we find between value estimation style and distractor effects may not be detected in all contexts of human behaviour. What the reviewer suggests goes along the same lines as our response to Reviewer #1 Comment 3, multi-attribute value estimation may have different structure: in some cases, the optimal solution may require a non-linear (e.g., multiplicative) response as in probabilistic or delayed decisions, but other cases (e.g., when estimating the value of a snack based on its taste, size, healthiness, price) a linear integration would suffice. In the latter kind of scenarios, both the optimal and the heuristic solutions may be additive and people’s value estimation “style” may not be teased apart. However, if different neural mechanisms associated with difference estimation processes are observed in certain scenarios, it suggests that these mechanisms are always present, even in scenarios where they do not alter the predictions. Probabilistic decision-making is also pervasive in many aspects of daily life and not just limited to the case of lotteries.

      While behaviour has been found to differ depending on whether lottery information is presented graphically or numerically, there is insufficient evidence to suggest biases towards additive or multiplicative evaluation, or towards positive or negative distractor effects. As such, we may expect that the correlation that we reveal in this paper, grounded in distinct neural mechanisms, would still hold even under different circumstances.

      Taking previous literature as examples, similar patterns of behaviour have been observed in humans when making decisions during trinary choice tasks. In a study conducted by Louie and colleagues (Louie et al., 2013; Webb et al., 2020), human participants performed a snack choice task where their behaviour could be modelled by divisive normalization with biphasic response (i.e., both positive and negative distractor effects). While these two studies only use a single numerical value of price for behavioural modelling, these prices should originate from an internal computation of various attributes related to each snack that are not purely related to lotteries. Expanding towards the social domain, studies of trinary decision making have considered face attractiveness and averageness (Furl, 2016), desirability of hiring (Chang et al., 2019), as well as desirability of candidates during voting (Chang et al., 2019). These choices involve considering various attributes unrelated to lotteries or numbers and yet, still display a combination of positive distractor and negative distractor (i.e. divisive normalization) effects, as in the current study. In particular, the experiments carried out by Chang and colleagues (Chang et al., 2019) involved decisions in a social context that resemble real-world situations. These findings suggests that both types of distractor effects can co-exist in other value based decision making tasks (Li et al., 2018; Louie et al., 2013) as well as decision making tasks in social contexts (Chang et al., 2019; Furl, 2016).

      Reviewer #2 Comment 5

      Minor Issues:

      The definition of EV as a normative choice baseline is problematic. The analysis requires that EV is the normative choice model (this is why the HV-LV gap is analyzed and the distractor effect defined in relation to it). But if the binary/ternary interaction effect can be accounted for by curvature of a value function, this should also change the definition of which lottery is HV or LV for that subject!

      We thank the reviewer for the comment. While the initial part of the paper discussed results that were defined by the EV model, the results shown in Supplementary Figure 2 were generated by replacing the utility function based on values obtained by using the composite model. Here, we have also redefined the definition of HV or LV for each subject depending on the updated value generated by the composite model prior to the regression.

      References

      Apesteguia, J. & Ballester, M. Monotone stochastic choice models: The case of risk and time preferences. Journal of Political Economy (2018).

      Block, H. D. & Marschak, J. Random Orderings and Stochastic Theories of Responses. Cowles Foundation Discussion Papers (1959).

      Khaw, M. W., Li, Z. & Woodford, M. Cognitive Imprecision and Small-Stakes Risk Aversion. Rev. Econ. Stud. 88, 1979-2013 (2020).

      Loomes, G. & Sugden, R. Testing Different Stochastic Specificationsof Risky Choice. Economica 65, 581-598 (1998).

      Luce, R. D. Indvidual Choice Behaviour. (John Wiley and Sons, Inc., 1959).

      Netzer, N., Robson, A. J., Steiner, J. & Kocourek, P. Endogenous Risk Attitudes. SSRN Electron. J. (2022) doi:10.2139/ssrn.4024773.

      Robson, A. J. Why would nature give individuals utility functions? Journal of Political Economy 109, 900-914 (2001).

      Webb, R. The (Neural) Dynamics of Stochastic Choice. Manage Sci 65, 230-255 (2019).

      Reviewer #3 (Public Review):

      Summary:

      The way an unavailable (distractor) alternative impacts decision quality is of great theoretical importance. Previous work, led by some of the authors of this study, had converged on a nuanced conclusion wherein the distractor can both improve (positive distractor effect) and reduce (negative distractor effect) decision quality, contingent upon the difficulty of the decision problem. In very recent work, Cao and Tsetsos (2022) reanalyzed all relevant previous datasets and showed that once distractor trials are referenced to binary trials (in which the distractor alternative is not shown to participants), distractor effects are absent. Cao and Tsetsos further showed that human participants heavily relied on additive (and not multiplicative) integration of rewards and probabilities.

      The present study by Wong et al. puts forward a novel thesis according to which interindividual differences in the way of combining reward attributes underlie the absence of detectable distractor effect at the group level. They re-analysed the 144 human participants and classified participants into a "multiplicative integration" group and an "additive integration" group based on a model parameter, the "integration coefficient", that interpolates between the multiplicative utility and the additive utility in a mixture model. They report that participants in the "multiplicative" group show a negative distractor effect while participants in the "additive" group show a positive distractor effect. These findings are extensively discussed in relation to the potential underlying neural mechanisms.

      Strengths:

      - The study is forward-looking, integrating previous findings well, and offering a novel proposal on how different integration strategies can lead to different choice biases.

      - The authors did an excellent job of connecting their thesis with previous neural findings. This is a very encompassing perspective that is likely to motivate new studies towards a better understanding of how humans and other animals integrate information in decisions under risk and uncertainty.

      - Despite that some aspects of the paper are very technical, methodological details are well explained and the paper is very well written.

      We thank the reviewer for the positive response and are pleased that the reviewer found our report interesting.

      Reviewer #3 Comment 1

      Weaknesses:

      The authors quantify the distractor variable as "DV - HV", i.e., the relative distractor variable. Do the conclusions hold when the distractor is quantified in absolute terms (as "DV", see also Cao & Tsetsos, 2023)? Similarly, the authors show in Suppl. Figure 1 that the inclusion of a HV + LV regressor does not alter their conclusions. However, the (HV + LV)*T regressor was not included in this analysis. Does including this interaction term alter the conclusions considering there is a high correlation between (HV + LV)*T and (DV - HV)*T? More generally, it will be valuable if the authors assess and discuss the robustness of their findings across different ways of quantifying the distractor effect.

      We thank the reviewer for the comment. In the original manuscript we had already demonstrated that the distractor effect was related to the integration coefficient using a number of complementary analyses. They include Figure 5 based on GLM2, Supplementary Figure 3 based on GLM3 (i.e., adding the HV+LV term to GLM2), and Supplementary Figure 4 based on GLM2 but applying the utility estimate from the composite model instead of expected value (EV). These three sets of analyses produced comparable results. The reason why we elected not to include the (HV+LV)T term in GLM3 (Supplementary Figure 3) was due to the collinearity between the regressors in the GLM. If this term is included in GLM3, the variance inflation factor (VIF) would exceed an acceptable level of 4 for some regressors. In particular, the VIF for the (HV+LV) and (HV+LV)T regressors is 5.420, while the VIF for (DV−HV) and (DV−HV)T is 4.723.

      Here, however, we consider the additional analysis suggested by the reviewer and test whether similar results are obtained. We constructed GLM4 including the (HV+LV)T term but replacing the relative distractor value (DV-HV) with the absolute distractor value (DV) in the main term and its interactions, as follows:

      GLM4:

      A significant negative (DV)T effect was found for the additive group [t(72)=−2.0253, p=0.0465] while the multiplicative group had a positive trend despite not reaching significance. Between the two groups, the (DV)T term was significantly different [t(142)=2.0434, p=0.0429]. While these findings suggest that the current conclusions could be partially replicated, simply replacing the relative distractor value with the absolute value in the previous analyses resulted in non-significant findings. Taking these results together with the main findings, it is possible to conclude that the positive distractor effect is better captured using the relative DV-HV term rather than the absolute DV term. This would be consistent with the way in which option values are envisaged to interact with one another in the mutual inhibition model (Chau et al., 2014, 2020) that generates the positive distractor effect. The model suggests that evidence is accumulated as the difference between the excitatory input from the option (e.g. the HV option) and the pooled inhibition contributed partly by the distractor. We have now included these results in the manuscript:

      “Finally, we performed three additional analyses that revealed comparable results to those shown in Figure 5. In the first analysis, reported in Supplementary Figure 3, we added an  term to the GLM, because this term was included in some analyses of a previous study that used the same dataset (Chau et al., 2020). In the second analysis, we added an  term to the GLM. We noticed that this change led to inflation of the collinearity between the regressors and so we also replaced the (DV−HV) term by the DV term to mitigate the collinearity (Supplementary Figure 4). In the third analyses, reported in Supplementary Figure 5, we replaced the utility terms of GLM2. Since the above analyses involved using HV, LV, and DV values defined by the normative Expected Value model, here, we re-defined the values using the composite model prior to applying GLM2. Overall, in the Multiplicative Group a significant positive distractor effect was found in Supplementary Figures 3 and 4. In the Additive Group a significant negative distractor effect was found in Supplementary Figures 3 and 5. Crucially, all three analyses consistently showed that the distractor effects were significantly different between the Multiplicative Group and the Additive Group.” (Lines 225-237)

      Reviewer #3 Comment 2

      The central finding of this study is that participants who integrate reward attributes multiplicatively show a positive distractor effect while participants who integrate additively show a negative distractor effect. This is a very interesting and intriguing observation. However, there is no explanation as to why the integration strategy covaries with the direction of the distractor effect. It is unlikely that the mixture model generates any distractor effect as it combines two "context-independent" models (additive utility and expected value) and is fit to the binary-choice trials. The authors can verify this point by quantifying the distractor effect in the mixture model. If that is the case, it will be important to highlight that the composite model is not explanatory; and defer a mechanistic explanation of this covariation pattern to future studies.

      We thank the reviewer for the comment. Indeed, the main purpose of applying the mixture model was to identify the way each participants combined attributes and, as the reviewer pointed out, the mixture model per se is context independent. While we acknowledge that the mixture model is not a mechanistic explanation, there is a theoretical basis for the observation that these two factors are linked.

      Firstly, studies that have examined the processes involved when humans combine and integrate different elements to form new representations (Barron et al., 2013; Papageorgiou et al., 2017; Schwartenbeck et al., 2023) have implicated the medial frontal cortex as a crucial region (Barron et al., 2013; Fellows, 2006; Hunt et al., 2012; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). Meanwhile, previous studies have also identified that positive distractor effects are linked to the medial frontal cortex (Chau et al., 2014; Noonan et al., 2017). Therefore, the current study utilized these two facts to establish the basis for a correlation between positive distractor effects and non-additive mechanisms for determining the integrated value of multi-component choices. Nevertheless, we agree with the reviewer that it will be an important future direction to look at how the covariation pattern emerges in a computational model. We have revised the manuscript in an attempt to address this issue.

      “At the neuroanatomical level, the negative distractor effect is mediated by the PPC, where signal modulation described by divisive normalization has been previously identified (Chau et al., 2014; Louie et al., 2011). The same region is also crucial for perceptual decision making processes (Shadlen & Shohamy, 2016). The additive heuristics for combining choice attributes are closer to a perceptual evaluation because distances in this subjective value space correspond linearly to differences in physical attributes of the stimuli, whereas normative (multiplicative) value has a non-linear relation with them (cf. Figure 1c). It is well understood that many sensory mechanisms, such as in primates’ visual systems or fruit flies’ olfactory systems, are subject to divisive normalization (Carandini & Heeger, 2012). Hence, the additive heuristics that are more closely based on sensory mechanisms could also be subject to divisive normalization, leading to negative distractor effects in decision making.

      In contrast, the positive distractor effect is mediated by the mPFC (Chau et al., 2014; Fouragnan et al., 2019). Interestingly, the same or adjacent, interconnected mPFC regions have also been linked to the mechanisms by which representational elements are integrated into new representations (Barron et al., 2013; Klein-Flügge et al., 2022; Law et al., 2023; Papageorgiou et al., 2017; Schwartenbeck et al., 2023). In a number of situations, such as multi-attribute decision making, understanding social relations, and abstract knowledge, the mPFC achieves this by using a spatial map representation characterised by a grid-like response (Constantinescu et al., 2016; Bongioanni et al., 2021; Park et al., 2021) and disrupting mPFC leads to the evaluation of composite choice options as linear functions of their components (Bongioanni et al., 2021). These observations suggest a potential link between positive distractor effects and mechanisms for evaluating multiple component options and this is consistent with the across-participant correlation that we observed between the strength of the positive distractor effect and the strength of non-additive (i.e., multiplicative) evaluation of the composite stimuli we used in the current task. Hence, one direction for model development may involve incorporating the ideas that people vary in their ways of combining choice attributes and each way is susceptible to different types of distractor effect.” (Lines 250-274)

      Reviewer #3 Comment 3

      -  Correction for multiple comparisons (e.g., Bonferroni-Holm) was not applied to the regression results. Is the "negative distractor effect in the Additive Group" (Fig. 5c) still significant after such correction? Although this does not affect the stark difference between the distractor effects in the two groups (Fig. 5a), the classification of the distractor effect in each group is important (i.e., should future modelling work try to capture both a negative and a positive effect in the two integration groups? Or just a null and a positive effect?).

      We thank the reviewer for the comment. We have performed Bonferroni-Holm correction and as the reviewer surmised, the negative distractor effect in the additive group becomes non-significant. However, we have to emphasize that our major claim is that there was a covariation between decision strategy (of combining attributes) and distractor effect (as seen in Figure 4). That analysis does not imply multiple comparisons. The analysis in Figure 5 that splits participants into two groups was mainly designed to illustrate the effects for an easier understanding by a more general audience. In many cases, the precise ways in which participants are divided into subgroups can have a major impact on whether each individual group’s effects are significant or not. It may be possible to identify an optimal way of grouping, but we refrained from taking such a trial-and-error approach, especially for the analysis in Figure 5 that simply supplements the point made in Figure 4. The key notion we would like the readers to take away is that there is a spectrum of distractor effects (ranging from negative to positive) that will vary depending on how the choice attributes were integrated.

      Reviewer #1 (Recommendations For The Authors):

      Reviewer #1 Recommendations 1

      Enhancements are necessary for the quality of the scientific writing. Several sentences have been written in a negligent manner and warrant revision to ensure a higher level of rigor. Moreover, a number of sentences lack appropriate citations, including but not restricted to:

      - Line 39-41.

      - Line 349-350 (also please clarify what it means by parameter estimate" is very accurate: correlation?).

      We thank the reviewer for the comment. We have made revisions to various parts of the manuscript to address the reviewer’s concerns.

      “Intriguingly, most investigations have considered the interaction between distractors and chooseable options either at the level of their overall utility or at the level of their component attributes, but not both (Chau et al., 2014, 2020; Gluth et al., 2018).” (Lines 40-42)

      “Additional simulations have shown that the fitted parameters can be recovered with high accuracy (i.e., with a high correlation between generative and recovered parameters).” (Lines 414-416)

      Reviewer #1 Recommendations 2

      Some other minor suggestions:

      - Correlative vs. Causality: the manuscript exhibits a lack of attentiveness in drawing causal conclusions from correlative evidence (manuscript title, Line 91, Line 153-155).

      - When displaying effect size on accuracy, there is no need to show the significance of intercept (Figure 2,5, & supplementary figures).

      - Adding some figure titles on Figure 2 so it is clear what each panel stands for.

      - In Figure 3, the dots falling on zero values are not easily seen. Maybe increasing the dot size a little?

      - Line 298: binomial linking function (instead of binomial distribution).

      - Line 100: composite, not compositive.

      - Line 138-139: please improve the sentence, if it's consistent with previous findings, what's the point of "surprisingly"?

      We thank the reviewer for the suggestions. We have made revisions to the title and various parts of the manuscript to address the reviewer’s concerns.

      - Correlative vs. Causality: the manuscript exhibits a lack of attentiveness in drawing causal conclusions from correlative evidence (manuscript title, Line 91, Line 153-155).

      We have now revised the manuscript:

      “Distractor effects in decision making are related to the individual’s style of integrating choice attributes” (title of the manuscript)

      “More particularly, we consider whether individual differences in combination styles could be related to different forms of distractor effect.” (Lines 99-100)

      “While these results may seem to suggest that a distractor effect was not present at an overall group level, we argue that the precise way in which a distractor affects decision making is related to how individuals integrate the attributes.” (Lines 164-167)

      - When displaying effect size on accuracy, there is no need to show the significance of intercept (Figure 2,5, & supplementary figures).

      We have also modified all Figures to remove the intercept.

      - Adding some figure titles on Figure 2 so it is clear what each panel stands for.

      We have added titles accordingly.

      - In Figure 3, the dots falling on zero values are not easily seen. Maybe increasing the dot size a little?

      In conjunction with addressing Reviewer #3 Recommendation 6, we have adapted the violin plots into histograms for a better representation of the values.

      - Line 298: binomial linking function (instead of binomial distribution).

      - Line 100: composite, not compositive.

      - Line 138-139: please improve the sentence, if it's consistent with previous findings, what's the point of "surprisingly"?

      We have made revisions accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Reviewer #2 Recommendations 1

      Line 294. The definition of DV, HV, LV is not sufficient. Presumably, these are the U from the following sections? Or just EV? But this is not explicitly stated, rather they are vaguely referred to as values." The computational modelling section refers to them as utilities. Are these the same thing?

      We thank the reviewer for the suggestion. We have clarified that the exact method for calculating each of the values and updated the section accordingly.

      “where HV, LV, and DV refer to the values of the chooseable higher value option, chooseable lower value option, and distractor, respectively. Here, values (except those in Supplementary Figure 5) are defined as Expected Value (EV), calculated by multiplying magnitude and probability of reward.” (Lines 348-350)

      Reviewer #2 Recommendations 2

      The analysis drops trials in which the distractor was chosen. These trials are informative about the presence (or not) of relative valuation or other factors because they make such choices more (or less) likely. Ignoring them is another example of the analysis being misspecified.

      We thank the reviewer for the suggestion and this is related to Major Issue 1 raised by the same reviewer. In brief, we adopted the same methods implemented by Cao and Tsetsos (Cao and Tsetsos, 2022) and that constrained us to applying a binomial model. Please refer to our reply to Major Issue 1 for more details.

      Reviewer #2 Recommendations 3

      Some questions and suggestions on statistics and computational modeling:

      Have the authors looked at potential collinearity between the regressors in each of the GLMs?

      We thank the reviewer for the comment. For each of the following GLMs, the average variance inflation factor (VIF) has been calculated as follows:

      GLM2 using the Expected Value model:

      Author response table 1.

      GLM2 after replacing the utility function based on the normative Expected Value model with values obtained by using the composite model:

      Author response table 2.

      GLM3:

      Author response table 3.

      As indicated in the average VIF values calculated, none of them exceed 4, suggesting that the estimated coefficients were not inflated due to collinearity between the regressor in each of the GLMs.

      Reviewer #2 Recommendations 4

      - Correlation results in Figure 4. What is the regression line displayed on this plot? I suspect the regression line came from Pearson's correlation, which would be inconsistent with the Spearman's correlation reported in the text. A reasonable way would be to transform both x and y axes to the ranked data. However, I wonder why it makes sense to use ranked data for testing the correlation in this case. Those are both scalar values. Also, did the authors assess the influence of the zero integration coefficient on the correlation result? Importantly, did the authors redo the correlation plot after defining the utility function by the composite models?

      We thank the reviewer for the suggestion. The plotted line in Figure 4 was based on the Pearson’s correlation and we have modified the text to also report the Pearson’s correlation result as well.

      If we were to exclude the 32 participants with integration coefficients smaller than 1×10-6 from the analysis, we still observe a significant positive Pearson’s correlation [r(110)=0.202, p=0.0330].

      Author response image 1.

      Figure 4 after excluding 32 participants with integration coefficients smaller than 1×10-6.

      “As such, we proceeded to explore how the distractor effect (i.e., the effect of (DV−HV)T obtained from GLM2; Figure 2c) was related to the integration coefficient (η) of the optimal model via a Pearson’s correlation (Figure 4). As expected, a significant positive correlation was observed [r(142)=0.282, p=0.000631]. We noticed that there were 32 participants with integration coefficients that were close to zero (below 1×10-6). The correlation remained significant even after removing these participants [r(110)=0.202, p=0.0330].” (Lines 207-212)

      The last question relates to results already included in Supplementary Figure 5, in which the analyses were conducted using the utility function of the composite model. We notice that although there was a difference in integration coefficient between the multiplicative and additive groups, a correlational analysis did not generate significant results [r(142)=0.124, p=0.138]. It is possible that the relationship became less linear after applying the composite model utility function. However, it is noticeable that in a series of complementary analyses (Figure 5: r(142)=0.282, p=0.000631; Supplementary Figure 3: r(142)=0.278, p=0.000746) comparable results were obtained.

      Reviewer #2 Recommendations 5

      - From lines 163-165, were the models tested on only the three-option trials or both two and three-opinion trials? It is ambiguous from the description here. It might be worth checking the model comparison based on different trial types, and the current model fitting results do not tell an absolute sense of the goodness of fit. I would suggest including the correctly predicted trial proportions in each trial type from different models.

      We thank the reviewer for the suggestion. We have only modeled the two-option trials and the key reason for this is because the two-option trials can arguably provide a better estimate of participants’ style of integrating attributes as they are independent of any distractor effects. This was also the same reason why Cao and Tsetsos applied the same approach when they were re-analyzing our data (Cao and Tsetsos, 2022). We have clarified the statement accordingly.

      “We fitted these models exclusively to the Two-Option Trial data and not the Distractor Trial data, such that the fitting (especially that of the integration coefficient) was independent of any distractor effects, and tested which model best describes participants’ choice behaviours.” (Lines 175-178)

      Reviewer #2 Recommendations 6

      - Along with displaying the marginal distributions of each parameter estimate, a correlation plot of these model parameters might be useful, given that some model parameters are multiplied in the value functions.

      We thank the reviewer for the suggestion. We have also generated the correlation plot of the model parameters. The Pearson’s correlation between the magnitude/probability weighting and integration coefficient was significant [r(142)=−0.259, p=0.00170]. The Pearson’s correlation between the inverse temperature and integration coefficient was not significant [r(142)=−0.0301, p=0.721]. The Pearson’s correlation between the inverse temperature and magnitude/probability weighting was not significant [r(142)=−0.0715, p=0.394].

      “Our finding that the average integration coefficient  was 0.325 coincides with previous evidence that people were biased towards using an additive, rather than a multiplicative rule. However, it also shows rather than being fully additive ( =0) or multiplicative ( =1), people’s choice behaviour is best described as a mixture of both. Supplementary Figure 1 shows the relationships between all the fitted parameters.” (Lines 189-193)

      Reviewer #2 Recommendations 7

      Have the authors tried any functional transformations on amounts or probabilities before applying the weighted sum? The two attributes are on entirely different scales and thus may not be directly summed together.

      We thank the reviewer for the comment. Amounts and probabilities were indeed both rescaled to the 0-1 interval before being summed, as explained in the methods (Line XXX). Additionally, we have now added and performed model fitting on an additional model with utility curvature based on the prospect theory (Kahneman & Tversky, 1979) and a weighted probability function (Prelec, 1998):

      where  and  represent the reward magnitude and probability (both rescaled to the interval between 0 and 1), respectively.  is the weighted magnitude and  is the weighted probability, while  and  are the corresponding distortion parameters. This prospect theory (PT) model was included along with the four previous models (please refer to Figure 3) in a Bayesian model comparison. Results indicate that the composite model remains as the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720).

      “Supplementary Figure 2 reports an additional Bayesian model comparison performed while including a model with nonlinear utility functions based on Prospect Theory (Kahneman & Tversky, 1979) with the Prelec formula for probability (Prelec, 1998). Consistent with the above finding, the composite model provides the best account of participants’ choice behaviour (exceedance probability = 1.000, estimated model frequency = 0.720).” (Lines 193-198)

      Reviewer #3 (Recommendations For The Authors):

      Reviewer #3 Recommendations 1

      - In the Introduction (around line 48), the authors make the case that distractor effects can co-exist in different parts of the decision space, citing Chau et al. (2020). However, if the distractor effect is calculated relative to the binary baseline this is no longer the case.

      - Relating to the above point, it might be useful for the authors to make a distinction between effects being non-monotonic across the decision space (within individuals) and effects varying across individuals due to different strategies adopted. These two scenarios are conceptually distinct.

      We thank the reviewer for the comment. Indeed, the ideas that distractor effects may vary across decision space and across different individuals are slightly different concepts. We have now revised the manuscript to clarify this:

      “However, as has been argued in other contexts, just because one type of distractor effect is present does not preclude another type from existing (Chau et al., 2020; Kohl et al., 2023). Each type of distractor effect can dominate depending on the dynamics between the distractor and the chooseable options. Moreover, the fact that people have diverse ways of making decisions is often overlooked. Therefore, not only may the type of distractor effect that predominates vary as a function of the relative position of the options in the decision space, but also as a function of each individual’s style of decision making.” (Lines 48-54)

      Reviewer #3 Recommendations 2

      - The idea of mixture models/strategies has strong backing from other Cognitive Science domains and will appeal to most readers. It would be very valuable if the authors could further discuss the potential level at which their composite model might operate. Are the additive and EV quantities computed and weighted (as per the integration coefficient) within a trial giving rise to a composite decision variable? Or does the integration coefficient reflect a probabilistic (perhaps competitive) selection of one strategy on a given trial? Perhaps extant neural data can shed light on this question.

      We thank the reviewer for the comment. The idea is related to whether the observed mixture in integration models derives from value being actually computed in a mixed way within each trial, or each trial involves a probabilistic selection between the additive and multiplicative strategies. We agree that this is an interesting question and to address it would require the use of some independent continuous measures to estimate the subjective values in quantitative terms (instead of using the categorical choice data). This could be done by collecting pupil size data or functional magnetic resonance imaging data, as the reviewer has pointed out. Although the empirical work is beyond the scope of the current behavioural study, it is worth bringing up this point in the Discussion:

      “The current finding involves the use of a composite model that arbitrates between the additive and multiplicative strategies. A general question for such composite models is whether people mix two strategies in a consistent manner on every trial or whether there is some form of probabilistic selection occurring between the two strategies on each trial such that only one strategy is used on any given trial while, on average, one strategy is more probable than the other. To test which is the case requires an independent estimation of subjective values in quantitative terms, such as by pupillometry or functional neuroimaging. Further understanding of this problem will also provide important insight into the precise way in which distractor effects operate at the single-trial level.” (Lines 275-282)

      Reviewer #3 Recommendations 3

      Line 80 "compare pairs of attributes separately, without integration". This additive rule (or the within-attribute comparison) implies integration, it is just not multiplicative integration.

      We thank the reviewer for the comment. We have made adjustments to the manuscript to ensure that the message delivered within this manuscript is consistent.

      “For clarity, we stress that the same mathematical formula for additive value can be interpreted as meaning that 1) subjects first estimate the value of each option in an additive way (value integration) and then compare the options, or 2) subjects compare the two magnitudes and separately compare the two probabilities without integrating dimensions into overall values. On the other hand, the mathematical formula for multiplicative value is only compatible with the first interpretation. In this paper we focus on attribute combination styles (multiplicative vs additive) and do not make claims on the order of the operations. More particularly, we consider whether individual differences in combination styles could be related to different forms of distractor effect.” (Lines 92-100)

      Reviewer #3 Recommendations 4

      - Not clear why the header in line 122 is phrased as a question.

      We thank the reviewer for the suggestion. We have modified the header to the following:

      “The distractor effect was absent on average” (Line 129)

      Reviewer #3 Recommendations 5

      - The discussion and integration of key neural findings with the current thesis are outstanding. It might help the readers if certain statements such as "the distractor effect is mediated by the PPC" (line 229) were further unpacked.

      We thank the reviewer for the suggestion. We have made modifications to the original passage to further elaborate the statement.

      “At the neuroanatomical level, the negative distractor effect is mediated by the PPC, where signal modulation described by divisive normalization has been previously identified (Chau et al., 2014; Louie et al., 2011). The same region is also crucial for perceptual decision making processes (Shadlen & Shohamy, 2016).” (Lines 250-253)

      Reviewer #3 Recommendations 6

      - In Fig. 3c, there seem to be many participants having the integration coefficient close to 0 but the present violin plot doesn't seem to best reflect this highly skewed distribution. A histogram would be perhaps better here.

      We thank the reviewer for the suggestion. We have modified the descriptive plots to use histograms instead of violin plots.

      “Figures 3c, d and e show the fitted parameters of the composite model: , the integration coefficient determining the relative weighting of the additive and multiplicative value ( , ); , the magnitude/probability weighing ratio ( , ); and , the inverse temperature ( , ). Our finding that the average integration coefficient  was 0.325 coincides with previous evidence that people were biased towards using an additive, rather than a multiplicative rule.” (Lines 186-191)

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a useful comparison of the dynamic properties of two RNA-binding domains. The data collection and analysis are solid, making excellent use of a suite of NMR methods. However, evidence to support the proposed model linking dynamic behavior to RNA recognition and binding by the tandem domains remains incomplete. The work will be of interest to biophysicists working on RNA-binding proteins.

      We thank eLife for taking the time and effort to review our manuscript. Evidence from the literature and our study shows a great deal of parity between the dynamic behavior of dsRBDs and its dsRNA-recognition and -binding that helped us culminate in proposing a fair model. As already mentioned in the manuscript, we have been working on the suggested experiments to support our proposed model further.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In the manuscript entitled "Differential conformational dynamics in two type-A RNA-binding domains drive the double-stranded RNA recognition and binding," Chugh and co-workers utilize a suite of NMR relaxation methods to probe the dynamic landscape of the TAR RNA binding protein (TRBP) double-stranded RNA-binding domain 2 (dsRBD2) and compare these to their previously published results on TRBP dsRBD1. The authors show that, unlike dsRBD1, dsRBD2 is a rigid protein with minimal ps-ns or us-ms time scale dynamics in the absence of RNA. They then show that dsRBD2 binds to canonical A-form dsRNA with a higher affinity compared to dsRBD1 and does so without much alteration in protein dynamics. Using their previously published data, the authors propose a model whereby dsRBD2 recognizes dsRNA first and brings dsRBD1 into proximity to search for RNA bulge and internal loop structures.

      We thank the Reviewer for sending us an encouraging review. We have combined the findings reported in the literature with new ones that led us to propose the dsRNA-binding model by tandem A-form dsRBDs.

      We propose that dsRBD1 can first recognize a variety of sequential and structurally different dsRNAs. dsRBD2 assists the interaction with a higher affinity, thus fortifying the interaction between TRBP and a possible substrate. This may enable the other associated proteins like Dicer and Ago2 to perform critical biological functions.

      However, we feel that a few statements in the comment above are factually incorrect.

      Statement 1. “They then show that dsRBD2 binds to canonical A-form dsRNA with a higher affinity compared to dsRBD1 and does so without much alteration in protein dynamics.”

      We have explicitly shown the perturbation in dsRBD2 dynamics upon RNA binding.

      Statement 2. “Using their previously published data, the authors propose a model whereby dsRBD2 recognizes dsRNA first and brings dsRBD1 into proximity to search for RNA bulge and internal loop structures.”

      Our previously published data suggests that dsRBD1, owing to its high conformational dynamics in solution, is able to recognize a variety of structurally and sequentially different dsRNAs ([Paithankar et al., 2022]). dsRBDs preferably bind to the double-stranded region (minor-major-minor-groove) of an A-form RNA ([Acevedo et al., 2016]; [Vuković et al., 2014]) and do not search for bulge and internal loop structures as a part of the binding event. Even though dsRBDs preferably bind to the double-stranded region, they can still accommodate perturbation in the A-form helix due to mismatch and bulges with decreased binding affinity ([Acevedo et al., 2015]). However, it is a matter of future research to identify how much of a deviation from the A-form structure can be accommodated by the dsRBDs. The diffusion event observed in the literature ([Koh et al., 2013]) also does not show any direct implication for searching for bulge and internal loop structures.

      Strengths:

      The authors expertly use a variety of NMR techniques to probe protein motions over six orders of magnitude in time. Other NMR titration experiments and ITC data support the RNA-binding model.

      Weaknesses:

      The data collection and analysis are sound. The only weakness in the manuscript is the lack of context with the much broader field of RNA-binding proteins. For example, many studies have shown that RNA recognition motif (RRM) domains have similar dynamic characteristics when binding diverse RNA substrates. Furthermore, there was no discussion about the entropy of binding derived from ITC. It might be interesting to compare with dynamics from NMR.

      We understand the reviewer’s point that this study is focused on a dsRNA-binding mechanism rather than addressing the much broader field of RNA-binding. There are multiple challenges in finding a single mechanism that works for all RNA-binding proteins. For instance, RRM is a single-stranded RNA binding domain that is able to read out the substrate base sequence. RRM behaves entirely differently than the dsRBD in terms of target specificity. Besides, several other RNA-binding domains, like the KH-domain, Puf domains, Zinc finger domains, etc., showcase a unique RNA-binding behavior. Thus, it would be really difficult to draw a single rule of thumb for RNA-recognition behavior for all these diverse domains.

      Thank you for pointing out the entropy of binding from ITC. We have now included the entropy of binding discussion in the main text, page 7.

      Reviewer #2 (Public Review):

      Summary:

      Proteins that bind to double-stranded RNA regulate various cellular processes, including gene expression and viral recognition. Such proteins often contain multiple double-stranded RNA-binding domains (dsRBDs) that play an important role in target search and recognition. In this work, Chug and colleagues have characterized the backbone dynamics of one of the dsRBDs of a protein called TRBP2, which carries two tandem dsRBDs. Using solution NMR spectroscopy, the authors characterize the backbone motions of dsRBD2 in the absence and presence of dsRNA and compare these with their previously published results on dsRBD1. The authors show that dsRBD2 is comparatively more rigid than dsRBD1 and claim that these differences in backbone motions are important for target recognition.

      Strengths:

      The strengths of this study are multiple solution NMR measurements to characterize the backbone motions of dsRBD2. These include 15N-R1, R2, and HetNOE experiments in the absence and presence of RNA and the analysis of these data using an extended-model-free approach; HARD-15N-experiments and their analysis to characterize the kex. The authors also report differences in binding affinities of dsRBD1 and dsRBD2 using ITC and have performed MD simulations to probe the differential flexibility of these two domains.

      Weaknesses:

      While it may be true that dsRBD2 is more rigid than dsRBD1, the manuscript lacks conclusive and decisive proof that such changes in backbone dynamics are responsible for target search and recognition and the diffusion of TRBP2 along the RNA molecule. To conclusively prove the central claim of this manuscript, the authors could have considered a larger construct that carries both RBDs. With such a construct, authors can probe the characteristics of these two tandem domains (e.g., semi-independent tumbling) and their interactions with the RNA. Additionally, mutational experiments may be carried out where specific residues are altered to change the conformational dynamics of these two domains. The corresponding changes in interactions with RNA will provide additional evidence for the model presented in Figure 8 of the manuscript. Finally, there are inconsistencies in the reported data between different figures and tables.

      We thank the reviewer for the comprehensive and insightful review. A larger construct carrying both RBDs was not used because of the multiple challenges pertaining to dynamics study by NMR spectroscopy (intrinsic R2 rates of the dsRBD1-dsRBD2 construct would be high, resulting in broadened peaks) as per our previous experience ([Paithankar et al., 2022]). There would be additional dynamics in that construct coming from domain-domain relative motions, and it is difficult to deconvolute the dynamics information. Further, the dsRNA needed to bind to this construct will be longer, causing further line broadening in NMR.

      Coming to mutational studies, careful designing of domain mutants remains as a challenge because the conformational dynamics in both the domains are distributed all through the backbone rather than only in the RNA-binding residues. The mutational studies would need an exhaustive number of mutations in protein as well as RNA to draw a parallel between the binding and dynamics. Having said that, we are working on making such mutations in the protein (at several locations to freeze the dynamics site-specifically) and the RNA (to change the shape of the dsRNA) to systematically study this mechanism, which will be out of scope of this manuscript.

      The reviewer has rightly pointed out some subtle superficial differences in the reported data between different figures and tables. These superficial differences are present because of the context in which we are describing the data. For example, in Figure S4, we are talking about the average relaxation rates and nOe values for only the common residues we were able to analyze between two magnetic field strengths 600 and 800 MHz. Whereas in Figure 6, we are comparing the averages of the core (159-227) dsRBD residues at 600 MHz, in the presence and absence of D12RNA. The differences, however, are minute falls well within the error range.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments -

      In regards to ITC data, dsRBD1 does not bind canonical A-form RNA with high affinity. What is dsRBD1 and dsRBD2 affinity to the miR-16 RNA?

      We have not performed ITC-based studies with miR-16 RNA for the domains. The study by Acevedo et al. has shown the effect of lengths of Watson-Crick duplex RNAs upon TRBP2 dsRBD binding. In this study, they have compared the ds22 RNA to miRNA/miRNA* duplex. By using EMSA, they show that the Kd,app (μM) for dsRBD1 is 3.5±0.2 and for dsRBD2 is 1.7±0.1, indicating a higher affinity by the latter ([Acevedo et al., 2015]).

      What was the amount of time used for the 1H saturation in the heteronuclear NOE experiment? Based on the average T1 (1/1.44 s-1) = 0.69 s, a recovery delay of >7 s should have been used for this experiment.

      According to Cavanagh et al., a minimum recovery/recycle delay should be greater than 5*1/R1 to make sure that 99% of the 1HN and 15N magnetizations are restored ([“Protein NMR Spectroscopy, Principles and Practice, John Cavanagh, Wayne J. Fairbrother, Arthur G. Palmer III, and Nicholas J. Skelton. Academic Press, San Diego, 1995, 587 pages, $59.95. ISBN: 0-12-164490-1.,” 1996]). In our study, we have used a relaxation delay of 5 s, which is greater than 7*1/R1avg thus ensuring at least 99% of the 1HN and 15N recover their bulk magnetization.

      Recommendations for improving writing and presentation -

      Figure 3 - The legend in panel C is incomplete.

      Figure 3 (Figure 4 in the revised manuscript) has been updated, and the legend now reads complete.

      Figures 3 E and F - The three views can be combined into one as is done in Figures 4 C and D.

      Thanks for the kind suggestion. We have depicted the kex in the three ranges to highlight the difference between the two domains at each range. Since there are three different exchange regimes with different populations, we believe this gives us an uncomplicated picture while classifying and comparing the dynamics between the two. Combining the three views into one becomes too overwhelming to visualize kex and population distribution in the protein.

      Figure 3 - The residues indicated in the text (e.g., R200, L212, and R224) should be indicated in panels E and F.

      We have marked the residues described in the text in Figure 4C (revised Figure 5C), and thus, they are not mentioned in Figures 3E and 3F (revised Figures 4E and 4F).

      The results and discussion put these findings into minimal context. Most comparisons are made between dsRBD1 and dsRBD2. What about other RNA-binding proteins? There is a wealth of structure/dynamics/functional data about RNA recognition motifs, which do exactly the same thing as described here but are missing.

      We understand the reviewer’s point that this study is focused on a dsRNA-binding mechanism rather than addressing the much broader field of RNA-binding. There are multiple challenges in finding a single mechanism that works for all RNA-binding proteins. For instance, RRM is a single-stranded RNA-recognition motif that can read out the substrate base sequence. RRM behaves entirely differently than the dsRBD in terms of sequence specificity. Besides, several other RNA-binding domains, like the KH-domain, Puf domains, Zinc-finger domains, etc., showcase a unique RNA-binding behaviour. Thus, with the current knowledge, it would not be possible to draw a single rule of thumb for RNA-recognition behaviour for all these diverse domains. Hence, the findings of this study are not comparable to those of other RNA-binding domains and are beyond the scope of this study.

      Results, page 8 - I'm not sure that allosteric quenching is appropriately invoked here. The amount of residues showing dynamics in the apo state is small and the number only moderately increases upon RNA binding. The observation that some residues show an increase and a neighboring residue shows a decrease (or vice versa) upon RNA binding could just be random with the small number of observations. This observation would be more convincing if it were happening to larger regions within the protein.

      We agree with the reviewer that the number of residues showing dynamics in the apo-state of the dsRBD2 is small when compared with that of dsRBD1, and the number only moderately increases upon RNA-binding. However, we believe it is quite important to invoke the allosteric quenching as all the new residues where dynamics is induced, do lie in the spatial proximity, as also observed in the dsRBD1 ([Paithankar et al., 2022]). It is a parameter to not only compare the differences and similarities in the two domains but also to highlight the presence of this phenomenon common in both the type-A dsRBDs of TRBP.

      Minor corrections -

      Introduction, page 2 - The order parameter should be defined for non-NMR experts.

      Thank you for the suggestion. The definition of order parameter has now been included on page 2 of the revised manuscript.

      Introduction, page 2 - TRBP should be defined in the main text the first time used.

      We have now defined TRBP on page 2 of the revised manuscript, where it is used in the main text for the first time.

      Results, page 5 - The reference for the HARD experiment should be given earlier in that paragraph.

      Thank you for the suggestion. We have now referenced the HARD experiment earlier in the last paragraph on page 5 of the revised manuscript.

      Results, page 7 - What is the limiting amount of RNA used for the D12-bound dsRBD2 spin relaxation measurements?

      The limiting amount of RNA used for the D12-bound dsRBD2 spin relaxation measurements is 0.05 equivalent (RNA:Protein= 50 mM:1000 mM). It has now been included on page 7 of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Throughout the manuscript, NMR datasets are not consistent with one another (a few examples are listed below).

      Figures S4, 6, and Table S4: (a) It is unclear why relaxation data for certain residues are missing in Table S4 (e.g., S156, V168, E177, F192, etc.).

      We thank the reviewer for pointing this out. We have now reanalyzed the data for all the above-mentioned residues and other missing residues. In the revised manuscript, we have added the data for the above-mentioned residues like E177, R189, and many more N- and C-terminal residues. Unfortunately, for some residues like V168, S184, F192, S209, and L222, we witnessed severe peak broadening while measuring the R2 rates and/or nOe. Hence, data for V168, S184, F192, S209, and L222 are missing in Table S4. We have explicitly mentioned this in the table legends about missing data for a few residues.

      (b) The reported values are not consistent. For example, Figure S4 says that the average 15N-R2 rate is 10.85 +/- 0.36 s-1 whereas Figure 6 says the 15N-R2 rate is 11.02 +/- 0.39 s-1 for the same dataset.

      The superficial differences are present because of the context in which we are describing the data (now mentioned in the methods section on page 13). In Figure S4, we are talking about the average relaxation rates and nOe values for only the common residues we could analyze between two magnetic field strengths, 600 and 800 MHz. Whereas in Figure 6 (revised figure 3), we compare the averages of all the analyzed core dsRBD residues at 600 MHz in the presence and absence of D12RNA. The differences, however, are insignificant, falling well within the error range.

      (c) There is also a discrepancy in reported R2 values (at 600 MHz) in Table S4. It is unclear to me what the reported values are, as most of these are below 1 s-1.

      Thank you very much for pointing out our mistake here. The Table S4 seems to have the wrong values for R2 at 600 MHz. However, the raw data submitted to the BMRB as entry 52077 holds the correct information. We have now updated the Table S4.

      (d) It is also unclear as to why perfectly resolved residues (e.g., L230, A232, D234, etc.) have been omitted from these data (and other datasets such as 15N-CPMGs shown in Figure S6).

      The residues L230, A232, D234, etc., are the C-terminal residues of TRBP-dsRBD2 beyond the core (159-227 aa) fold of dsRBD. They have now been included in the revised figures S6 and S11 for completeness.

      (e) Figure 6 reports a 15N-R2 of 21 s-1 for one of the residues in the absence of RNA. This data point has been omitted from Figure S4.

      In Figure S4, we are talking about relaxation rates and nOe values only for the common residues we could analyze between the two magnetic field strengths, 600 and 800 MHz. Thus, that 15N-R2 value has been omitted.

      The S2 order parameters reported in Figures S5 and S10 are inconsistent with one another, as additional residues are shown in S10 (e.g., N159).

      Thank you for pointing it out. We have now reanalyzed the data for S2 order parameter and Rex by including more residues (e.g., N159, R189, etc) in the core and have updated both Figures S5 and S10. Please see the revised supplementary information.

      Tables S6 and S7 report values for residue R189. This residue has been omitted in every other dataset. Based on the 1H-15N HSQC spectrum shown in Figure S3, this residue gives a well-resolved crosspeak (which lies adjacent to V228). Can the authors explain why they omit data for this residue in Figures S4, 6, and Table S4?

      The reviewer is correct in pointing out that data for R189 is missing in the fast dynamics data, such as Figure S4, Figure 6 (revised figure 3), and Table S4. We have now reanalyzed our raw data and included data for R189 and other missing residues in our updated manuscript. Please see the revised figures S4 and 6 (revised figure 3) and the revised table S4.  

      Moreover, this residue lies in the loop2 region of this domain. Based on the MD simulations (Figure 2), this region is more flexible compared to the rest of the domain. Does the corresponding 15N-relaxation data support this claim?

      Yes, the apo 15N-relaxation data do strongly support this claim. R189 showed a higher than core average R2 rate (R189 = 15.44 +/- 0.69 s-1; core = 10.92 +/- 0.37 s-1) and a lower than core average nOe (R189 = 0.49 +/- 0.05; core = 0.73 +/- 0.03) which indicate a higher flexibility than the rest of the core (updated Figure 3 and Table S4). Additionally, the S2 order parameter for R189 was found to be 0.52 +/- 0.03, slightly lower than the core average of 0.59 +/- 0.03, indicating a more flexible region than the core (updated Table S14). Moreover, the dynamics parameters extracted from HARD experimental data using the geoHARD method for apo TRBP2-dsRBD2 shown in Table S18 depict a high kex value of 31748.72 +/- 955.20 Hz for R189. This supports the claim that this residue is highly flexible with a high exchange rate.

      Figure S9. I was not able to follow this dataset as the data points are not consistent between different residues.

      In Figure S9, the residue-wise peak intensities plotted against the RNA concentration indicate that line broadening was witnessed for all the core residues (irrespective of the initial peak intensity). Another interesting observation is that the terminal residues do not undergo the same line broadening as seen in the core residues.

      It is also unclear why residue G185 is highlighted.

      It is taken as an example and magnified to show the extent of line broadening. This is now explicitly mentioned in the figure caption in the revised supplementary information.

      It is also not clear exactly what the authors are trying to fit, as I see no chemical shift changes upon the addition of RNA (Fig. S8), and the equation used for data fitting (pg. 11) uses chemical shift changes (and not the changes in intensities).

      The same equation can be used to fit the chemical shift perturbation and peak intensity perturbation as a function of ligand concentration. Here, we have tried to fit the intensity perturbation. We have now modified the statement on page 11 in the revised manuscript.

      Table S2: The ITC analysis reports an n value of ~3. Can authors elaborate as to what this means?

      The stoichiometry ~3 indicates the number of TBDP2-dsRBD2 that can interact with D12 RNA in a single binding event. The minimum binding register for dsRBDs is known to be >8 bp (12 bp for optimal binding) ([Ramos et al., 2000]), and one single domain only covers one-third of the face of the cylindrical RNA ([Masliah et al., 2018]). Hence, 3 dsRBD2 could interact with a 12-mer RNA in solution.

      The reported Kd values between the main text (page 7) and Figure 5 are not consistent with one another (one lists 1.18 uM while the other says 1.11 uM). Table S2 does not list the parameters for interactions between dsRBD1 and D12.

      Figure 5 (revised figure 6) depicts the information of a single isolated experiment out of a total of three, whereas in the main text, we say 1.18 μM as the average Kd value (table S2).

      Figure S4: The red axis should read "211" instead of "111".

      Thank you for your helpful insight. We have now changed it in the revised figure.

      Table S3 lists the structural motifs of the two dsRBDs, which are nearly identical to one another, and yet the manuscript claims that these are different (page 4, paragraph 1).

      We agree with the reviewer that the differences are minute but important, which we have tried to highlight in this paper. In particular, loop 2, critical for dsRNA-binding ([Masliah et al., 2012]), is 1 residue longer in dsRBD2 and has a possible effect in enhanced substrate binding.

      Figure S8 shows severe signal attenuation for many residues upon the addition of 100 uM RNA. The most notable among these are residues M194, T195, and C196. Can the authors explain how they measure 15N-relaxation rates for these residues in the presence of 50 uM D12?

      First, we have recorded the measured 15N-relaxation rates for these residues in the presence of 50 mM D12 (RNA:Protein= 50 mM:1000 mM)), corresponding to 0.05 equivalent RNA. The amount of RNA used is less than that used for the HSQC-based titration shown in Figure S8, 0.1 equivalent RNA (RNA:Protein = 5 mM:50 mM), where we witness line broadening for residues like M194, T195, and C196. Second, we increased the overall protein concentration from 50 mM (used in HSQC-based titration) to 1000 mM (used in relaxation measurements) to ensure a better signal-to-noise ratio in all the spectra.

      Use the same coloring scheme for Figures S7 and S8.

      Thank you for the suggestion. We have now edited Figure S8 accordingly.

      Figures are often listed out-of-order, making it difficult to follow the manuscript.

      Thank you for the suggestion. We have now amended the main text to refer to the figures sequentially. While doing so, we have renumbered Figure 6 as Figure 3, Figure 3 as Figure 4, Figure 4 as Figure 5, and Figure 5 as Figure 6.

      Figure captions for the relaxation data should specify the temperature at which these datasets were collected.

      Thanks for the valuable suggestion. We have now added the temperature wherever applicable.

      References

      Acevedo R, Evans D, Penrod KA, Showalter SA. 2016. Binding by TRBP-dsRBD2 Does Not Induce Bending of Double-Stranded RNA. Biophys J 110:2610–2617. doi:10.1016/j.bpj.2016.05.012

      Acevedo R, Orench-Rivera N, Quarles KA, Showalter SA. 2015. Helical Defects in MicroRNA Influence Protein Binding by TAR RNA Binding Protein. PLoS ONE 10:e0116749. doi:10.1371/journal.pone.0116749

      Koh HR, Kidwell MA, Ragunathan K, Doudna JA, Myong S. 2013. ATP-independent diffusion of double-stranded RNA binding proteins.

      Masliah G, Barraud P, Allain FH-T. 2012. RNA recognition by double-stranded RNA binding domains: a matter of shape and sequence. Cell Mol Life Sci 70:1875–1895. doi:10.1007/s00018-012-1119-x

      Masliah G, Maris C, König SL, Yulikov M, Aeschimann F, Malinowska AL, Mabille J, Weiler J, Holla A, Hunziker J, Meisner‐Kober N, Schuler B, Jeschke G, Allain FH. 2018. Structural basis of siRNA recognition by TRBP double‐stranded RNA binding domains. EMBO J 37:e97089. doi:10.15252/embj.201797089

      Paithankar H, Tarang GS, Parvez F, Marathe A, Joshi M, Chugh J. 2022. Inherent conformational plasticity in dsRBDs enables interaction with topologically distinct RNAs. Biophys J 121:1038–1055. doi:10.1016/j.bpj.2022.02.005

      Protein NMR Spectroscopy, Principles and Practice, John Cavanagh, Wayne J. Fairbrother, Arthur G. Palmer III, and Nicholas J. Skelton. Academic Press, San Diego, 1995, 587 pages, $59.95. ISBN: 0-12-164490-1. 1996. . J Magn Reson, Ser B 113:277. doi:10.1006/jmrb.1996.0189

      Ramos A, Grünert S, Adams J, Micklem DR, Proctor MR, Freund S, Bycroft M, Johnston DS, Varani G. 2000. RNA recognition by a Staufen double‐stranded RNA‐binding domain. EMBO J 19:997–1009. doi:10.1093/emboj/19.5.997

      Vuković L, Koh HR, Myong S, Schulten K. 2014. Substrate Recognition and Specificity of Double-Stranded RNA Binding Proteins. Biochemistry 53:3457–3466. doi:10.1021/bi500352s

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Campbell et al investigated the effects of light on the human brain, in particular the subcortical part of the hypothalamus during auditory cognitive tasks. The mechanisms and neuronal circuits underlying light effects in non-image forming responses are so far mostly studied in rodents but are not easily translated in humans. Therefore, this is a fundamental study aiming to establish the impact light illuminance has on the subcortical structures using the high-resolution 7T fMRI. The authors found that parts of the hypothalamus are differently responding to illuminance. In particular, they found that the activity of the posterior hypothalamus increases while the activity of the anterior and ventral parts of the hypothalamus decreases under high illuminance. The authors also report that the performance of the 2-back executive task was significantly better in higher illuminance conditions. However, it seems that the activity of the posterior hypothalamus subpart is negatively related to the performance of the executive task, implying that it is unlikely that this part of the hypothalamus is directly involved in the positive impact of light on performance observed. Interestingly, the activity of the posterior hypothalamus was, however, associated with an increased behavioural response to emotional stimuli. This suggests that the role of this posterior part of the hypothalamus is not as simple regarding light effects on cognitive and emotional responses. This study is a fundamental step towards our better understanding of the mechanisms underlying light effects on cognition and consequently optimising lighting standards. 

      Strengths: 

      While it is still impossible to distinguish individual hypothalamic nuclei, even with the highresolution fMRI, the authors split the hypothalamus into five areas encompassing five groups of hypothalamic nuclei. This allowed them to reveal that different parts of the hypothalamus respond differently to an increase in illuminance. They found that higher illuminance increased the activity of the posterior part of the hypothalamus encompassing the MB and parts of the LH and TMN, while decreasing the activity of the anterior parts encompassing the SCN and another part of TMN. These findings are somewhat in line with studies in animals. It was shown that parts of the hypothalamus such as SCN, LH, and PVN receive direct retinal input in particular from ipRGCs. Also, acute chemogenetic activation of ipRGCs was shown to induce activation of LH and also increased arousal in mice. 

      Weaknesses: 

      While the light characteristics are well documented and EDI calculated for all of the photoreceptors, it is not very clear why these irradiances and spectra were chosen. It would be helpful if the authors explained the logic behind the four chosen light conditions tested. Also, the lights chosen have cone-opic EDI values in a high correlation with the melanopic EDI, therefore we can't distinguish if the effects seen here are driven by melanopsin and/or other photoreceptors. In order to provide a more mechanistic insight into the light-driven effects on cognition ideally one would use a silent substitution approach to distinguish between different photoreceptors. This may be something to consider when designing the follow-up studies. 

      Reviewer #1 (Recommendations For The Authors): 

      (1) As suggested in the public review more information regarding the reasons behind the chosen light condition is needed. 

      While the light characteristics are well documented and EDI calculated for all of the photoreceptors, it is not very clear why these irradiances and spectra were chosen. It would be helpful if the authors explained the logic behind the four chosen light conditions tested. Also, the lights chosen have cone-opic EDI values in a high correlation with the melanopic EDI, therefore we can't distinguish if the effects seen here are driven by melanopsin or cone opsins. In order to provide a more mechanistic insight into the light-driven effects on cognition ideally one would use a silent substitution approach to distinguish between different photoreceptors. 

      (2) In support of this work, it was shown in mice that acute activation of ipRGCs using chemogenetics induces c-fos in some of the hypothalamic brain areas discussed here including LH (Milosavljevic et al, 2016 Curr Biol). Another study to consider including in the discussion is by Sonoda et al 2020 Science, in which the authors showed that a subset of ipRGCs release GABA. 

      (3) Figure 1 looks squashed, especially the axes. Also, Figure 2 looks somewhat blurry. I would suggest that the authors edit the figures to correct this.

      We thank the reviewer for their positive comments and agree with the weaknesses they pointed out. 

      (1) The explanation regarding the choice of the illuminance is now included in the revised manuscript (PAGE 17): “Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al., 2010, 2011). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.”

      The revised discussion makes clear that these choices limit the interpretation about the photoreceptors involved (PAGES 12-13): “We based our rationale and part of our interpretations on ipRGC projections, which have been demonstrated in rodents to channel the NIF biological impact of light and incorporate the inputs from rods and cones with their intrinsic photosensitivity into a light signal that can impact the brain (Güler et al., 2008; Tri & Do, 2019). Given the polychromatic nature of the light we used, classical photoreceptors and their projections to visual brain areas are, however, very likely to have directly or indirectly contributed to the modulation by light of the regional activity of the hypothalamus.”

      The discussion also points out the promises of silent substitution (PAGE 13): “Future human studies could isolate the contribution of each photoreceptor class to the impact of light on cognitive brain functions by manipulating prior light history (Chellappa et al., 2014) or through the use of silent substitutions between metameric light exposures (Viénot et al., 2012)”.

      (2) We now refer to the studies by Milosavljevic et al. and Sonoda et al. 

      PAGE 9: “Our data may therefore be compatible with an increase in orexin release by the LH with increasing illuminance. In line with this assumption, chemoactivation of ipRGCs lead to increase c-fos production, a marker of cellular activation, over several nuclei of the hypothalamus, including the lateral hypothalamus (Milosavljevic et al., 2016). If this initial effect of light we observe over the posterior part of the hypothalamus was maintained over a longer period of exposure, this would stimulate cognition and maintain or increase alertness (Campbell et al., 2023) and may also be part of the mechanisms through which daytime light increases the amplitude in circadian variations of several physiological features (BanoOtalora et al., 2021; Dijk et al., 2012).”

      PAGE 10: “Chemoactivation of ipRGCs in rodents led to an increase activity of the SCN, over the inferior anterior hypothalamus, but had no impact on the activity of the VLPO, over the superior anterior hypothalamus (Milosavljevic et al., 2016). How our findings fit with these fine-grained observations and whether there are species-specific differences in the responses to light over the different part of the hypothalamus remains to be established.”

      PAGE 10: “In terms of chemical communication, these changes in activity could be the results of an inhibitory signal from a subclass of ipRGCs, potentially through the release aminobutyric acid (GABA), as a rodent study found that a subset of ipRGCs release GABA at brain targets including the SCN (and intergeniculate leaflet and ventral lateral geniculate nucleus), leading to a reduction in the ability of light to affect pupil size and circadian photoentrainment (Sonoda et al., 2020). Whatever the signalling of ipRGC, our finding over the anterior hypothalamus could correspond to a modification of GABA signalling of the SCN which has been reported to have excitatory properties, such that the BOLD signal changes we report may correspond to a reduction in excitation arising in part from the SCN (Albers et al., 2017).”

      (3) Figures 1 and 2 were modified. We hope their quality is now satisfactory. We are willing to provide separate figures prior to publication of the Version of Record.

      Reviewer #2 (Public Review): 

      Summary 

      The interplay between environmental factors and cognitive performance has been a focal point of neuroscientific research, with illuminance emerging as a significant variable of interest. The hypothalamus, a brain region integral to regulating circadian rhythms, sleep, and alertness, has been posited to mediate the effects of light exposure on cognitive functions. Previous studies have illuminated the role of the hypothalamus in orchestrating bodily responses to light, implicating specific neural pathways such as the orexin and histamine systems, which are crucial for maintaining wakefulness and processing environmental cues. Despite advancements in our understanding, the specific mechanisms through which varying levels of light exposure influence hypothalamic activity and, in turn, cognitive performance, remain inadequately explored. This gap in knowledge underscores the need for high-resolution investigations that can dissect the nuanced impacts of illuminance on different hypothalamic regions. Utilizing state-of-the-art 7 Tesla functional magnetic resonance imaging (fMRI), the present study aims to elucidate the differential effects of light on the hypothalamic dynamics and establish a link between regional hypothalamic activity and cognitive outcomes in healthy young adults. By shedding light on these complex interactions, this research endeavours to contribute to the foundational knowledge necessary for developing innovative therapeutic strategies aimed at enhancing cognitive function through environmental modulation. 

      Strengths: 

      (1) Considerable Sample Size and Detailed Analysis: The study leverages a robust sample size and conducts a thorough analysis of hypothalamic dynamics, which enhances the reliability and depth of the findings. 

      (2) Use of High-Resolution Imaging: Utilizing 7 Tesla fMRI to analyze brain activity during cognitive tasks offers high-resolution insights into the differential effects of illuminance on hypothalamic activity, showcasing the methodological rigor of the study. 

      (3) Novel Insights into Illuminance Effects: The manuscript reveals new understandings of how different regions of the hypothalamus respond to varying illuminance levels, contributing valuable knowledge to the field. 

      (4) Exploration of Potential Therapeutic Applications: Discussing the potential therapeutic applications of light modulation based on the findings suggests practical implications and future research directions. 

      Weaknesses: 

      (1) Foundation for Claims about Orexin and Histamine Systems: The manuscript needs to provide a clearer theoretical or empirical foundation for claims regarding the impact of light on the orexin and histamine systems in the abstract. 

      (2) Inclusion of Cortical Correlates: While focused on the hypothalamus, the manuscript may benefit from discussing the role of cortical activation in cognitive performance, suggesting an opportunity to expand the scope of the manuscript. 

      (3) Details of Light Exposure Control: More detailed information about how light exposure was controlled and standardized is needed to ensure the replicability and validity of the experimental conditions. 

      (4) Rationale Behind Different Exposure Protocols: To clarify methodological choices, the manuscript should include more in-depth reasoning behind using different protocols of light exposure for executive and emotional tasks. 

      Reviewer #2 (Recommendations For The Authors): 

      Attention to English language precision and correction of typographical errors, such as "hypothalamic nuclei" instead of "hypothalamus nuclei," is necessary for enhancing the manuscript.

      We thank the reviewer for recognising the interest and strength of our study.

      (1) As detailed in the discussion, we do believe orexin and histamine are excellent candidates for mediating the results we report. As also pointing out, however, we are in no position to know which neurons, nuclei, neurotransmitter and neuromodulator underlie the results. The last sentence of the abstract (PAGE 2) was therefore removed as we agree the statement was too strong. We carefully reconsider the discussion and believe that no such overstatement was present.

      (2) Hypothalamus nuclei are connected to multiple cortical (and subcortical) structures. The relevance of these projections will vary with the cognitive task considered. In addition, we have not yet considered the cortex in our analyses such that truly integrating cortical structures appears premature. 

      We nevertheless added the following short statement (PAGE 11): “Subcortical structures, and particularly those receiving direct retinal projections, including those of the hypothalamus, are likely to receive light illuminance signal first before passing on the light modulation to the cortical regions involved in the ongoing cognitive process (Campbell et al., 2023).”

      (3) We now include the following as part of the method section (PAGES 16-17): “Illuminance and spectra could not be directly measured within the MRI scanner due to the ferromagnetic nature of measurement systems. The coil of the MRI and the light stand, together with the lighting system were therefore placed outside of the MR room to reproduce the experimental conditions of the in a completely dark room. A sensor was placed 2 cm away from the mirror of the coil that is mounted at eye level, i.e. where the eye of the first author of the paper would be positioned, to measure illuminance and spectra. The procedure was repeated 4 times for illuminance and twice for spectra and measurements were averaged. This procedure does not take into account interindividual variation in head size and orbit shape such that the reported illuminance levels may have varied slightly across subjects. The relative differences between illuminance are, however, very unlikely to vary substantially across participants such that statistics consisting of tests for the impact of relative differences in illuminance were not affected. The detailed values reported in Supplementary Table 2 were computed combining spectra and illuminance using the excel calculator associated with a published work (Lucas et al., 2014).”

      (4) The explanation regarding the choice of the illuminance is now included in the revised manuscript (PAGE 17): “Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al., 2010, 2011). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.”

      (5) The manuscript was thoroughly rechecked, and we hope to have spotted all typos and language errors.

      Reviewer #3 (Public Review): 

      Summary: 

      Campbell and colleagues use a combination of high-resolution fMRI, cognitive tasks, and different intensities of light illumination to test the hypothesis that the intensity of illumination differentially impacts hypothalamic substructures that, in turn, promote alterations in arousal that affect cognitive and affective performance. The authors find evidence in support of a posterior-to-anterior gradient of increased blood flow in the hypothalamus during task performance that they later relate to performance on two different tasks. The results provide an enticing link between light levels, hypothalamic activity, and cognitive/affective function, however, clarification of some methodological choices will help to improve confidence in the findings. 

      Strengths: 

      * The authors' focus on the hypothalamus and its relationship to light intensity is an important and understudied question in neuroscience. 

      Weaknesses: 

      (1) I found it challenging to relate the authors' hypotheses, which I found to be quite compelling, to the apparatus used to test the hypotheses - namely, the use of orange light vs. different light intensities; and the specific choice of the executive and emotional tasks, which differed in key features (e.g., block-related vs. event-related designs) that were orthogonal to the psychological constructs being challenged in each task. 

      (4) Given the small size of the hypothalamus and the irregular size of the hypothalamic parcels, I wondered whether a more data-driven examination of the hypothalamic time series would have provided a more parsimonious test of their hypothesis. 

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors may wish to explain the importance of the orange light condition in the early section of the results -- i.e., when they first present the task structure. As it stands, I don't have a good appreciation of why the orange light was included -- was it a control condition? And if the differences between the light conditions (e.g., the narrow- vs. wide-band of light) were indeed ignored by focussing on the illuminance levels, are there any potential issues that the authors could then mitigate against with further experiments/analyses? 

      (2) Are there other explanations for why illuminance levels might improve cognitive performance? For instance, the capacity to more easily perceive the stimuli in an experiment could plausibly make it easier to complete a given task. If this is the case, can the authors conceptualise a way to rule out this hypothesis? 

      (3) Did the authors control for the differences in the number of voxels in each hypothalamic subregion? Or perhaps consider estimating the variance across voxels within the larger parcels, to determine whether the mean time series was comparable to the time series of the smaller parcels? 

      (4) An alternative strategy that would mitigate against the differences in the size of hypothalamic parcels would be to conduct analyses on the hypothalamus without parcellation, but instead using dimensionality reduction techniques to observe the natural spread of responses across the hypothalamus. From the authors' results, my intuition is that these analyses will lead to similar conclusions, albeit without any of the potential issues with respect to differently-sized parcels. 

      We thank the reviewer for acknowledging the originality and interest of our study. We agree that some methodological choices needed more explanation. We will address the weaknesses they pointed out as follows:

      (1) The explanation regarding the choice of the illuminance is now included in the revised manuscript (PAGE 17): “Blue-enriched light illuminances were set according to the technical characteristics of the light source and to keep the overall photon flux similar to prior 3T MRI studies of our team (between ~1012 and 1014 ph/cm²/s) (Vandewalle et al., 2010, 2011). The orange light was introduced as a control visual stimulation for potential secondary whole-brain analyses. For the present region of interest analyses, we discarded colour differences between the light conditions and only considered illuminance as indexed by mel EDI lux. This constitutes a limitation of our study as it does not allow attributing the findings to a particular photoreceptor class.”

      The revised discussion makes clear that these choices limit the interpretation about the photoreceptors involved (PAGE 12-13): “We based our rationale and part of our interpretations on ipRGC projections, which have been demonstrated in rodents to channel the NIF biological impact of light and incorporate the inputs from rods and cones with their intrinsic photosensitivity into a light signal that can impact the brain (Güler et al., 2008; Tri & Do, 2019). Given the polychromatic nature of the light we used, classical photoreceptors and their projections to visual brain areas are, however, very likely to have directly or indirectly contributed to the modulation by light of the regional activity of the hypothalamus.”

      We further mention that (PAGE 13): “Furthermore, we cannot exclude that colour and/or spectral differences between the orange and 3 blue-enriched light conditions may have contributed to our findings. Research in rodent model demonstrated that variation in the spectral composition of light was perceived by the suprachiasmatic nucleus to set circadian timing (Walmsley et al., 2015). No such demonstration has, however, been reported yet for the acute impact of light on alertness, attention, cognition or affective state.”

      Regarding the choice of tasks, we added the following the method section (PAGE 18): “Prior work of our team showed that the n-back task and emotional task included in the present protocol were successful probes to demonstrate that light illuminance modulates cognitive activity, including within subcortical structures (though resolution did not allow precise isolation of nuclei or subparts) (e.g. (Vandewalle et al., 2007, 2010)). When taking the step of ultra-high-field imaging, we therefore opted for these tasks as our goal was to show that illuminance affects brain activity across cognitive domains while not testing for task-specific aspects of these domains.”

      We further added to the discussion (PAGE 8): “The pattern of light-induced changes was consistent across an executive and an emotional task which consisted of block and an event-related fMRI design, respectively. This suggests that a robust anterior-posterior gradient of activity modulation by illuminance is present in hypothalamus across cognitive domains.”

      (2) We are unsure what the reviewer refers to when he states that the experiment could make it easier to perceive a stimulus. Aside from the fact that illuminance can increase alertness and attention such that a stimulus may be better or more easily perceived/processed, we do not see how blocks of ambient light, i.e. a long-lasting visual stimulus, may render auditory stimulation (letters or pseudo-words in the present) easier to perceive. To our knowledge multimodal or cross-modal integration has been robustly demonstrated for short visual/auditory cues that would precede or accompany auditory/visual stimulation. 

      We are willing to clarify this issue in the text if we receive additional explanation from the reviewer.

      (3) We added subpart size as covariate in the analyses (instead of subpart number) and it did not affect the output of the statistical analyses (Author response table 1). 

      For completeness, we further computed standard deviation of the activity estimates of the voxels within each parcel for the main analysis of the n-back tasks and found a main effect of subpart (Author response table 2) indicating that the variability of the estimates varied across subparts. Post hoc contrast and the display included in Author response image1 show however that the difference were not related to subpart size per see. It is in fact the largest subpart (subpart 4) that shows the largest variability while one of the smallest subpart (subpart 2) shows the lowest variability. Though it may have contributed, it is therefore unlikely to explain our findings. We consider the analyses reported in (Author response table 1 and 2 and (Author response image 1 as very technical and did not include it in the supplementary material for conciseness. If the reviewer judges it essential, we can reconsider our decision.  

      While computing these analyses, we realized that there were errors in the table 1 reporting the statistical outcomes of the main analyses of the emotional task. The main statistical outputs remain the same except for a nominal main effect of the task (emotional vs. neutral) and the fact that post hoc show a consistent difference between the posterior subpart (subpart 3) and all the other subparts, rather than all the other subparts except for the difference with superior tubular hypothalamus subpart: p-corrected = 0.09. We apologise for this slight error and were unable to isolate its origin. It does not modify the rest of the analyses (which were also rechecked) and the interpretations. 

      Author response table 1.

      Recomputations of the main GLMMs using subpart sizes rather than subpart numbers as covariate of interest.

      Author response image 1.

      Activity estimate variability per hypothalamus subpart and subpart size.  

      Author response table 2.

      Difference in activity estimate standard deviation between hypothalamus subparts during the n-back task.

      Outputs of the generalized linear mixed model (GLMM) with subject as the random factor (intercept and slope), and task and subpart as repeated measures (ar(1) autocorrelation).

      * The corrected p-value for multiple comparisons over 2 tests is p < 0.025.

      # Refer to Fig.2A for correspondence of subpart numbers

      The text referring to Table 1 was modified accordingly (PAGE 5): “A nominal main effect of the task was detected for the emotional task [p = 0.049; Table 1] but not for the n-back task. For both tasks, there was no significant main effect for any of the other covariates and post hoc analyses showed that the index of the illuminance impact was consistently different in the posterior hypothalamus subpart compared to the other subparts [pcorrected ≤ 0.05]”.

      (4) We agree that a data driven approach could have constituted an alternative means to tests our hypothesis. We opted for an approach that we mastered best, while still allowing to conclusively test for regional differences in activity across the hypothalamus. Examination of time series of the very same data we used will mainly confirm the results of our analyses – an anterior-posterior gradient in the impact of illuminance - while it may yield slight differences in the boarders of the subparts of the hypothalamus undergoing decreased or increased activity with increasing illuminance. While the suggested approach may have been envisaged if we had been facing negative results (i.e. no differences between subparts, potentially because subparts would not reflect functional differences in response to illuminance change), it would constitute a circular confirmation of our main findings (i.e. using the same data). While we truly appreciate the suggestion, we do not consider that it would constitute a more parsimonious test of our hypothesis, now that we successfully applied GLM/parcellation and GLMM approaches.

      We added the following statement to the discussion to take this comment into account (PAGE 12): “Future research may consider data-driven analyses of hypothalamus voxels time series as an alternative to the parcellation approach we adopted here. This may refine the delineation of the subparts of the hypothalamus undergoing decreased or increased activity with increasing illuminance.”

      Response references

      Albers, H. E., Walton, J. C., Gamble, K. L., McNeill, J. K., & Hummer, D. L. (2017). The dynamics of GABA signaling: Revelations from the circadian pacemaker in the suprachiasmatic nucleus. Frontiers in Neuroendocrinology, 44, 35–82. https://doi.org/10.1016/J.YFRNE.2016.11.003

      Bano-Otalora, B., Martial, F., Harding, C., Bechtold, D. A., Allen, A. E., Brown, T. M., Belle, M. D. C., & Lucas, R. J. (2021). Bright daytime light enhances circadian amplitude in a diurnal

      mammal. Proceedings of the National Academy of Sciences of the United States of America, 118(22), e2100094118. https://doi.org/10.1073/PNAS.2100094118/SUPPL_FILE/PNAS.2100094118.SAPP.PDF

      Campbell, I., Sharifpour, R., & Vandewalle, G. (2023). Light as a Modulator of Non-Image-Forming Brain Functions Positive and Negative Impacts of Increasing Light Availability. Clocks & Sleep, 5(1), 116. https://doi.org/10.3390/CLOCKSSLEEP5010012

      Chellappa, S. L., Ly, J. Q. M., Meyer, C., Balteau, E., Degueldre, C., Luxen, A., Phillips, C., Cooper, H. M., & Vandewalle, G. (2014). Photic memory for executive brain responses. Proceedings of the National Academy of Sciences of the United States of America, 111(16), 6087–6091. https://doi.org/10.1073/pnas.1320005111

      Dijk, D. J., Duffy, J. F., Silva, E. J., Shanahan, T. L., Boivin, D. B., & Czeisler, C. A. (2012). Amplitude reduction and phase shifts of melatonin, cortisol and other circadian rhythms after a gradual advance of sleep and light exposure in humans. PloS One, 7(2). https://doi.org/10.1371/JOURNAL.PONE.0030037

      Güler, A. D., Ecker, J. L., Lall, G. S., Haq, S., Altimus, C. M., Liao, H. W., Barnard, A. R., Cahill, H., Badea, T. C., Zhao, H., Hankins, M. W., Berson, D. M., Lucas, R. J., Yau, K. W., & Hattar, S. (2008). Melanopsin cells are the principal conduits for rod-cone input to non-image-forming vision. Nature, 453(7191), 102–105. https://doi.org/10.1038/nature06829

      Lucas, R. J., Peirson, S. N., Berson, D. M., Brown, T. M., Cooper, H. M., Czeisler, C. A., Figueiro, M. G., Gamlin, P. D., Lockley, S. W., O’Hagan, J. B., Price, L. L. A., Provencio, I., Skene, D. J., & Brainard, G. C. (2014). Measuring and using light in the melanopsin age. Trends in Neurosciences, 37(1), 1–9. https://doi.org/10.1016/j.tins.2013.10.004

      Milosavljevic, N., Cehajic-Kapetanovic, J., Procyk, C. A., & Lucas, R. J. (2016). Chemogenetic Activation of Melanopsin Retinal Ganglion Cells Induces Signatures of Arousal and/or Anxiety in Mice. Current Biology, 26(17), 2358–2363. https://doi.org/10.1016/j.cub.2016.06.057

      Sonoda, T., Li, J. Y., Hayes, N. W., Chan, J. C., Okabe, Y., Belin, S., Nawabi, H., & Schmidt, T. M. (2020). A noncanonical inhibitory circuit dampens behavioral sensitivity to light. Science (New York, N.Y.), 368(6490), 527–531. https://doi.org/10.1126/SCIENCE.AAY3152

      Tri, M., & Do, H. (2019). Melanopsin and the Intrinsically Photosensitive Retinal Ganglion Cells: Biophysics to Behavior. Neuron, 104, 205–226. https://doi.org/10.1016/j.neuron.2019.07.016

      Vandewalle, G., Hébert, M., Beaulieu, C., Richard, L., Daneault, V., Garon, M. Lou, Leblanc, J., Grandjean, D., Maquet, P., Schwartz, S., Dumont, M., Doyon, J., & Carrier, J. (2011). Abnormal hypothalamic response to light in seasonal affective disorder. Biological Psychiatry, 70(10), 954–961. https://doi.org/10.1016/j.biopsych.2011.06.022

      Vandewalle, G., Schmidt, C., Albouy, G., Sterpenich, V., Darsaud, A., Rauchs, G., Berken, P. Y., Balteau, E., Dagueldre, C., Luxen, A., Maquet, P., & Dijk, D. J. (2007). Brain responses to violet, blue, and green monochromatic light exposures in humans: Prominent role of blue light and the brainstem. PLoS ONE, 2(11), e1247. https://doi.org/10.1371/journal.pone.0001247

      Vandewalle, G., Schwartz, S., Grandjean, D., Wuillaume, C., Balteau, E., Degueldre, C., Schabus, M., Phillips, C., Luxen, A., Dijk, D. J., & Maquet, P. (2010). Spectral quality of light modulates emotional brain responses in humans. Proceedings of the National Academy of Sciences of the United States of America, 107(45), 19549–19554. https://doi.org/10.1073/pnas.1010180107

      Viénot, F., Brettel, H., Dang, T.-V., & Le Rohellec, J. (2012). Domain of metamers exciting intrinsically photosensitive retinal ganglion cells (ipRGCs) and rods. Journal of the Optical Society of America A, 29(2), A366. https://doi.org/10.1364/josaa.29.00a366

      Walmsley, L., Hanna, L., Mouland, J., Martial, F., West, A., Smedley, A. R., Bechtold, D. A., Webb, A. R., Lucas, R. J., & Brown, T. M. (2015). Colour As a Signal for Entraining the Mammalian Circadian Clock. PLOS Biology, 13(4), e1002127. https://doi.org/10.1371/journal.pbio.1002127

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The current study aims to quantify associations between the regular use of proton-pump inhibitors (PPI) - defined as using PPI most days of the week during the last 4 weeks at one cross-section in time - with several respiratory outcomes up to several years later in time. There are 6 respiratory outcomes included: risk of influenza, pneumonia, COVID-19, other respiratory tract infections, as well as COVID-19 severity and mortality).

      Strengths:

      Several sensitivity analyses were performed, including i) estimation of the e-value to assess how strong unmeasured confounders should be to explain observed effects, ii) comparison with another drug with a similar indication to potentially reduce (but not eliminate) confounding by indication.

      We are grateful for your pointing out the strengths in our article, particularly the assessment of e-values and the comparison with another medication to mitigate confounding by indication. We extend our sincere gratitude to the reviewer for identifying multiple concerns and offering constructive feedback to help improve our manuscript. We will incorporate these suggestions into our revisions.

      Weaknesses:

      (1) The main exposure of interest seems to be only measured at one time-point in time (at study enrollment) while patients are considered many years at risk afterwards without knowing their exposure status at the time of experiencing the outcome. As indicated by the authors, PPI are sometimes used for only short amounts of time. It seems biologically implausible that an infection was caused by using PPI for a few weeks many years ago.

      We agree with the reviewer that PPIs are sometimes used for only short amounts of time, as indicated in our manuscript. We acknowledge that it is a limitation of the UK Biobank cohort, and we have discussed this in the discussion section as follows:

      “Given that the PPI exposure was mainly assessed at the baseline recruitment, it was possible that a small proportion of PPI users was misclassified during the follow-up due to the medication discontinuation, which may result in an underestimation of potential risk.” (Page 14, Line 8-10)

      In addition, to alleviate these concerns, we have conducted effect medication for the subgroup of potential long-term users, which were defined by participants with indications of PPI use. This information has been included in the discussion section:

      “In addition, no effect moderation was observed in subgroup analyses for the main outcome among PPI users with indications (more likely to regularly use PPIs for a long period) compared to those without indications, indicating the risks remained increased among long-term PPI users.” (Page 14, Line 12-15)

      We hope that in the future, the concerns highlighted by the reviewer can be resolved by utilizing datasets with close follow-up, especially regarding medication use:

      “Since the follow-up prescription data was lacking in our study to precisely identifying the long-term users, further evaluation using cohorts with close follow-up is needed.” (Page 14, Line 15-17)

      (2) Previous studies have shown that by focusing on prevalent users of drugs, one often induces several biases such as collider stratification bias, selection bias through depletion of susceptible, etc.

      Because of the limitations of data from the UK Biobank, such as the absence of details on initiation of medications and regular monitoring, we were restricted to using a prevalent user design to assess the associations between PPI use and respiratory outcomes. We have discussed it in the limitation section:

      “Given that the PPI exposure was mainly assessed at the baseline recruitment, it was possible that a small proportion of PPI users was misclassified during the follow-up due to the medication discontinuation, which may result in an underestimation of potential risk. However, the prevalent user design could underestimate the actual risks of PPI use for respiratory infections, which indicates the real effect might be stronger [38]……Since the follow-up prescription data was lacking in our study to precisely identifying the long-term users, further evaluation using cohorts with close follow-up is needed.” (Page 14, Line 8-17)

      (3) It seems Kaplan Meier curves are not adjusted for confounding through e.g. inverse probability weighting. As such the KM curves are currently not informative (or the authors need to make clearer that curves are actually adjusted for measured confounding).

      Your kind suggestions are greatly appreciated. We have plotted Kaplan Meier curves adjusted for confounding by inverse probability weighting with the measured confounders according to the reviewer’s advice. The methods and results are demonstrated as follows:

      “The event-free probabilities were compared by Kaplan-Meier survival curves with inverse probability weights adjusting for the measured covariates.” (Page 8, Line 13-15)

      “Regular PPI users had lower event-free probabilities for influenza and pneumonia compared to those of non-users (Supplementary Figure 2 A-B).” (Page 9, Line 21-23)

      “PPI users had lower event-free probabilities for COVID-19 severity and mortality, but not COVID-19 positivity compared to those of non-users (Supplementary Figure 2 C-E).” (Page 10, Line 9-10)

      (4) Throughout the manuscript the authors seem to misuse the term multivariate (using one model with e.g. correlated error terms to assess multiple outcomes at once) when they seem to mean multivariable.

      We apologize for misusing the term “multivariate” and “multivariable” in our previous manuscript. We have corrected the misused terms throughout the manuscript:

      “Univariate and multivariable Cox proportional hazards regression models were utilized to assess the association between regular use of PPIs and the selected outcomes.” (Page 7, Line 19-20)

      “The remaining imbalanced covariates (standardized mean difference ≥ 0.1) after propensity score matching were further adjusted by multivariate multivariable Cox regression models to calculate HRs and 95% CIs.” (Page 8, Line 23-25)

      (5) Given multiple outcomes are assessed there is a clear argument for accounting for multiple testing, which following the logic of the authors used in terms of claiming there is no association when results are not significant may change their conclusions. More high-level, the authors should avoid the pitfall of stating there is evidence of absence if there is only an absence of evidence in a better way (no statistically significant association doesn't mean no relationship exists).

      We have revised our interpretation for the results, particularly for those without statically significant association based on the reviewer’s advice, and clearly recognize that the conclusions should be interpreted with cautions:

      “In contrast, the risk of COVID-19 infection was not significant with regular PPI use…” (Page 2, Line 11-12)

      “PPI users were associated with a higher risk of influenza (HR 1.74, 95%CI 1.19-2.54), but the risks with pneumonia or COVID-19-related outcomes were not evident.” (Page 2, Line 14-16)

      “…while the effects on pneumonia or COVID-19-related outcomes under PPI use were attenuated when compared to the use of H2RAs.” (Page 2, Line 18-19, in the Abstract)

      “…while their association with pneumonia and COVID-19-related outcomes is diminished after comparison with H2RA use and remains to be further explored.” (Page 15, Line 21-22, in the Conclusion)

      (6) While the authors claim that the quantitative bias analysis does show results are robust to unmeasured confounding, I would disagree with this. The e-values are around 2 and it is clearly not implausible that there are one or more unmeasured risk factors that together or alone would have such an effect size. Furthermore, if one would use the same (significance) criteria as used by the authors for determining whether an association exists, the required effect size for an unmeasured confounder to render effects 'statistically non-significant' would be even smaller.

      We agree with the reviewer that there might still exist one or more unmeasured risk factors that have effect sizes larger than 2. Hence, we cannot affirm that the findings are robust to unmeasured confounding in the current analysis, which is a limitation of our study. We have deleted the previous statement, and added more discussion in the limitation section:

      “Moreover, patients with exacerbations of respiratory disorders (e.g., asthma, COPD) might suffer from a wide range of gastrointestinal symptoms that lead to the use of PPIs [38]. Due to the lack of data for respiratory severity and close follow-up for medication use, residual confounding might still exist due to the observational nature.” (Page 14, Line 23-27)

      (7) Some patients are excluded due to the absence of follow-up, but it is unclear how that is determined. Is there potentially some selection bias underlying this where those who are less healthy stop participating in the UK biobank?

      Thank you for your question. The reasons for the absence of follow-up are mainly classified into five categories, including: (1) Death reported to UK Biobank by a relative; (2) NHS records indicate they are lost to follow-up; (3) NHS records indicate they have left the UK; (4) UK Biobank sources report they have left the UK; (5) Participant has withdrawn consent for future linkage. According to the data from UK Biobank (https://biobank.ndph.ox.ac.uk/ showcase/field.cgi?id=190), the major reason for the loss of follow-up among participants is their departure from the UK (84.7% of participants who were lost to follow-up). In addition, not including those who were less healthy in the study might also underestimate the risk, leading to lower estimated effects of PPIs for respiratory infections. We have supplemented this in our revised manuscript:

      “Among them, 1,297 participants without follow-up, which were mainly determined by reported death, departure from the UK, or withdrawn consent, had been removed after initial exclusion.” (Page 4, Line 25-27)

      (8) Given that the exposure is based on self-report how certain can we be that patients e.g. do know that their branded over-the-counter drugs are PPI (e.g. guardium tablets)? Some discussion around this potential issue is lacking.

      Thank you for your concerns. In the data collection by the UK Biobank, the participants can enter the generic or trade name of the treatment on the touchscreen to match the medications they used. We have added this important information to the method section:

      “The exposure of interest was regular use of PPIs. The participants could enter the generic or trade name of the treatment on the touchscreen to match the medications they used (Supplementary Table S1).” (Page 5, Line 6-8)

      We acknowledge that specific information on prescribed or over-the-counter use of medications is lacking in the UK Biobank. We have discussed it in the limitation section:

      “Limitations exist in our study. Information on dose and duration of PPI use, discrimination between prescription and over-the-counter use of PPIs, health-seeking behavior, different types of pneumonia, and pneumococcus vaccination is currently not available from the UK Biobank.” (Page 14, Line 5-8)

      (9) Details about the deprivation index are needed in the main text as this is a UK-specific variable that will be unfamiliar to most readers.

      Thank you for your question on the definition of deprivation index. We have proved the details  about the deprivation index in the manuscript:

      “…socioeconomic status (deprivation index, which was defined using national census information on car ownership, household overcrowding, owner occupation, and unemployment combined for postcode areas of residence)…” (Page 6, Line 14-17)

      (10) It is unclear how variables were coded/incorporated from the main text. More details are required, e.g. was age included as a continuous variable and if so was non-linearity considered and how?

      We apologize for not elucidating how variables were incorporated into the main text. Previously, the linearity between continuous variables and outcomes was assessed by Martingale residuals plots, while the variables detected with non-linearity were regarded as categorical variables for further analyses. For example, after evaluation with the Martingale residuals plot, age demonstrated non-linearity, and we incorporated it as a categorical variable for the analysis of COVID-related mortality.

      We have supplemented the information in the method section:

      “The linearity between continuous variables and outcomes was assessed by Martingale residuals plots, while the variables detected with non-linearity were regarded as categorical variables for further analyses.” (Page 6, Line 28 to Page 7, Line 1)

      (11) The authors state that Schoenfeld residuals were tested, but don't report the test statistics. Could they please provide these, e.g. it would already be informative if they report that all p-values are above a certain value.

      We are sorry for not providing the statistics about the Schoenfeld residual in our previous manuscript. We have supplemented the information in our revisions:

      “Schoenfeld residuals tests were used to evaluate the proportional hazards assumptions, while no violation of the assumption was detected (Supplementary Table S3).” (Page 7, Line 27 to Page 8, Line 1)

      (12) The authors would ideally extend their discussion around unmeasured confounding, e.g. using the DAGs provided in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7832226/, in particular (but not limited to) around severity and not just presence/absence of comorbidities.

      Thank you for your insightful suggestions that the discussion about unmeasured confounding should be extended. We agree with the reviewer that, in addition to the comorbidities themselves, their severity could also have an important impact on the use of PPIs. We have added the discussion in the limitation section with citing the article (PMC7832226):

      “Moreover, patients with exacerbations of comorbid disorders (e.g., diabetes, asthma, COPD) might suffer from a wide range of gastrointestinal symptoms that lead to the use of PPIs [38] (Supplementary Figure S4). Due to the lack of data for respiratory severity and close follow-up for medication use, residual confounding might still exist due to the observational nature.” (Page 14, Line 23-27)

      (13) The UK biobank is known to be highly selected for a range of genetic, behavioural, cardiovascular, demographic, and anthropometric traits. The potential problems this might create in terms of collider stratification bias - as highlighted here for example: https://www.nature.com/articles/s41467-020-19478-2 - should be discussed in greater detail and also appreciated more when providing conclusions.

      We acknowledge the reviewer's point about the UK Biobank's highly selective nature potentially leading to collider stratification bias in the evaluation of COVID-19-related outcomes. We have discussed this in detail and are cautious when generating conclusions.

      “Furthermore, the highly selective nature of the UK Biobank might create collider stratification bias for the evaluation of COVID-19-related outcomes, and thus the conclusions should be interpreted with cautions [39].” (Page 15, Line 2-4)

      Reviewer #2 (Public Review):

      Summary:

      Zeng et al investigate in an observational population-based cohort study whether the use of proton pump inhibitors (PPIs) is associated with an increased risk of several respiratory infections among which are influenza, pneumonia, and COVID-19. They conclude that compared to non-users, people regularly taking PPIs have increased susceptibility to influenza, pneumonia, as well as COVID-19 severity and mortality. By performing several different statistical analyses, they try to reduce bias as much as possible, to end up with robust estimates of the association.

      Strengths:

      The study comprehensively adjusts for a variety of critical covariates and by using different statistical analyses, including propensity-score-matched analyses and quantitative bias analysis, the estimates of the associations can be considered robust.

      We are grateful to the reviewer for pointing out the merits of our articles, which include adjusting for a wide range of covariates, employing diverse statistical analyses, and using robust data. We will revise our manuscript further based on the reviewer's suggestions.

      Weaknesses:

      As it is an observational cohort study there still might be bias. Information on the dose or duration of acid suppressant use was not available, but might be of influence on the results. The outcome of interest was obtained from primary care data, suggesting that only infections as diagnosed by a physician are taken into account. Due to the self-limiting nature of the outcome, differences in health-seeking behavior might affect the results.

      Thank you for your questions for information on the dose/duration of acid suppressants, the source of diagnosis, and the health-seeking behavior of participants. For the data from the UK Biobank, the dose or duration of acid suppressant use was not available since the information was not collected as baseline or follow-up. In addition, the outcome of interest was also retrieved from the hospital ICD diagnosis. We apologize for not clarifying it in our previous manuscript. Moreover, we agree with the reviewer that the health-seeking behavior could have an impact on the analyses, whereas the correlated data are still not available from the UK Biobank. We have discussed them in the method and limitation section:

      “Briefly, the first reported occurrences of respiratory system-related conditions within primary care data,  and hospital inpatient data defined by the International Classification of Diseases (ICD)- 10 codes were categorized by the UK Biobank.” (Page 5, Line 21-25)

      “Limitations exist in our study. Information on dose and duration of PPI use, discrimination between prescription and over-the-counter use of PPIs, health-seeking behavior, different types of pneumonia, and pneumococcus vaccination is currently not available from the UK Biobank.” (Page 14, Line 5-8)

      Reviewer #1 (Recommendations For The Authors):

      Analysis code should be made available.

      Thank you for your question. We have provide the sources of the analysis code we used for this study in our revised manuscript:

      “The codes used in this study can be found at: https://epirhandbook.com/en/ and https://cran.r-project.org/doc/contrib/Epicalc_Book.pdf.” (Page 16, Line 21-22)

      Reviewer #2 (Recommendations For The Authors):

      It might be interesting to study whether including self-reported infections changes the results, as people using PPI may more easily consult their GP even for a self-limiting disease such as influenza and therefore are more likely diagnosed/confirmed with such a respiratory infection.

      Thank you for your insightful suggestions on conducting analyses including self-reported infections. Therefore, we have included the self-reported cases as sensitivity analyses, and the results were not significantly altered, which confirms the robustness of our results:

      “Self-reported infections, except for COVID-19-related outcomes due to the lack of data, were also included for the outcomes as sensitivity analyses. The self-reported cases were reported at the baseline or subsequent UK Biobank assessment center visit.” (Page 8, Line 17-19)

      “Inclusion of the self-reported cases did not significantly alter the results (Supplementary Table S4).” (Page 9, Line 17-18)

      Moreover, to address the above-mentioned, sub-analyses differentiating between over-the-counter and prescribed medication might be interesting.

      Thank you for your questions on differentiating between over-the-counter and prescribed medication. We have thoroughly looked up the data provided by the UK Biobank, but it is a pity that they are not provided. We have discussed this in the limitation section:

      “Information on dose and duration of PPI use, discrimination between prescription and over-the-counter use of PPIs, health-seeking behavior, different types of pneumonia, and pneumococcus vaccination is currently not available from the UK Biobank.” (Page 14, Line 5-8)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1.1) I thought the manuscript was very clear. While I realize the authors included the reference to medulloblastoma in the introduction based on previous reviewer comments, I think this speculation is better left in the discussion.

      Whilst we appreciate the reviewers feedback here, we felt it was important to include a reference to medulloblastoma and developmental disorders associated with the cerebellum to put this work into a broader context.

      We removed the sentence “Medulloblastoma can be a consequence of uncontrolled proliferation of granule cell progenitors, with BMP overexpression being a potential therapeutic avenue to inhibit this proliferation” to limit the speculation in this statement.

      (1.2) line 81: It would be better to cite the 2 original papers (Hendrikes et al 2022, Smith et al 2022) rather than the Phoenix commentary article. I'm not sure the Phoenix article needs to be cited at all within this paper.

      We have cited the two suggested papers and removed the citation to Phoenix et al.

      (1.3) line 102: confusing sentence with the unexpected separation of do and not: "the same conditional deletions of BMP pathway elements that fail to block early granule cell specification at the rhombic lip do result not in a larger cerebellum as might be expected, but either have no affect".

      We thank the reviewer for pointing out this error and have corrected the text to “do not result in a larger cerebellum”.

      (1.4) line 133: inconsistent acronyms (for example, W9 vs pcw9).

      This has been corrected to PCW in all occurrences.

      (1.5) line 139: coronal vs transverse? it seems like you show transverse sectioning but refer to it as coronal in the text.

      We thank the reviewer for highlighting this and have corrected the text to “transverse”.

      (1.6) fig 2C: would it be possible to provide a similar inset as 2D?

      We thank the reviewer for this suggestion and have added the insets in 2C. We agree that this is now clearer and more consistent with the rest of the figure.

      (1.7) line 368/369/435/436 missing arrows.

      The arrows have been re-added- it appears that they did not show up on the uploaded PDF.

      (1.8) line 517 missing word: rhombic-lip-derived.

      This typo has been corrected.

      Reviewer #2 (Public Review):

      (2.1) Fig. 3 M Why are there asterisks both above and below the brackets?

      This was a formatting error that has now been corrected.

      (2.2) Fig. 8. The arrows (BMP up and BMP down) are touching the right ")" in the figure, which makes it hard to read.

      This was also a formatting issue which has been corrected.

      (2.3) Fig. 4 and 8 legends. There are spaces in the text which I believe are for arrows to be inserted "(BMP )", but the arrows have been omitted in the PDF that I read.

      This is the same as reviewer 1’s comment- these have been re-added to the text and appears to have been an issue with the PDF upload.

      (2.4) Fig. 3 legend gets very hard to read at the end, where it seems some punctuation is missing.

      We have re-worded the legend for Fig. 3 to make it easier to read.

      (2.5) Significant figures in some of the text are probably too much given the accuracy at which they can be measured with.

      We appreciate the reviewer’s concerns here, however these were added in response to the original reviewer’s request to “provide some additional support to otherwise qualitative observations”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      In my opinion, the three most important controls (hopefully easy):

      (1) Include no ATR controls for optogenetic activation experiments (not all, just one or two, e.g., Figure 4B, C, or D, for the highest activation condition). The concern is that it can be quite hard to use light to both monitor neural responses while also using light to activate the function of other neurons.

      We thank the reviewer for the suggestions. We use a 2-photon 910-nm laser (which does not activate Chrimson) for imaging of GCaMP and a 624-nm LED (which does not activate GFP) for Chrimson activation. Calcium (GCaMP) signals are detected by PMT during Chrimson activation. With this setup, we are able to image GCaMP signals without crosstalk during activation of Chrimson.

      We performed calcium imaging in animals that were not fed ATR and found that SS04185 showed no response to LED stimulation at the strongest intensity (µW/mm) (New Figure 4 – figure supplement 1B).

      (2) Demonstrate that their RNAi constructs do indeed knock down the intended target gene. They showed nicely in Figure 5A that SeIN128 expresses GABA. Presumably, these neurons also express VGAT. Is it possible to check the expression of VGAT after RNAi knockdown? The concern is that using only a single RNAi introduces the possibility of off-target effects. Using multiple RNAi lines for VGAT or other parts of the pathway would also alleviate this (minor concern).

      We thank the reviewer for raising this point. We agree that using only one RNAi line (HMS02355) for VGAT in Figure 5A is a weakness. 

      Accordingly, we have performed additional experiments to quantify the effect of RNAi knockdown of VGAT using HMS02335 in all neurons, followed by subsequent immunostaining against GABA or VGAT. We found that both VGAT and GABA were significantly reduced in the neuropil (Figure 5 – figure supplement 1C and D). These data strongly suggest that HMS02355 knocks down VGAT and reduces GABA at axon terminals. We note that HMS02355 has been used previously for knocking down GABA signaling in the following studies.

      (1) Kallman BR, Kim H, Scott K (2015). Excitation and inhibition onto central courtship neurons biases Drosophila mate choice. eLife 4:e11188. https://doi.org/10.7554/eLife.11188

      (2) Zhao W, Zhou P, Gong C et al. (2019). A disinhibitory mechanism biases Drosophila innate light preference. Nat Commun 10, 124. https://doi.org/10.1038/s41467-018-07929-w

      (3) Yamagata N, Ezaki T, Takahashi T, Wu H, Tanimoto H (2021). Presynaptic inhibition of dopamine neurons controls optimistic bias. eLife 10:e64907. https://doi.org/10.7554/eLife.6490

      (3) Include genetic controls for their driver line.

      In Figure 1, it would be nice to see one half or the other half of their split GAL4 line in their manipulations. The concern is that perhaps the phenotype is coming from something unexpected in the genetic background.

      We thank the reviewer for the suggestion. We have added half of the GAL4 lines (AD or DBD) as controls (New Figure 1 – figure supplement 2). We found that SS04185 showed reduction of rolling, whereas AD only or DBD only (split control) did not (half of the split lines). 

      In the discussion:

      It seems that activation of SS014185 has additional effects beyond what the authors have quantified. Specifically, larvae do not appear to re-initiate rolling in the same manner as Basin activation alone. Also, there appears to be an off-response, turning.

      We appreciate the reviewer’s comments. We have included a section in the discussion to consider the differences patterns of rolling observed during joint stimulation of Basins and SS04185 and during stimulation of Basins alone, as well as the increase in turning following the offset of joint stimulation of Basins and SS04185 compared with stimulation of Basins alone (lines 464 to 481). Although the reasons for these differences are beyond the scope of the paper, we have added Figure 2 – figure supplement 1K, which shows that co-activation of SS04185-MB and Basins is sufficient to evoke turning following the offset of stimulation, suggesting that the increased turning may be due to the activation of SS04185-MB neurons and independent of SS04185-DN neurons.  

      The labeling of the Figure panels could be improved. In many places, it is not clear that Basins are being stimulated in the background, whereas in nearby panels, it is clearly labeled. This is confusing for the reader.

      We thank the reviewer for the constructive suggestions. We have modified all relevant figures to read “Basins>Chrimson” above the pink line indicating the period of optogenetic activation.

      Reviewer #2 (Recommendations For The Authors):

      Claims, rigorousness, repeatability, and accuracy of terms.

      (1) In line 254, the authors suggest that the slow response of SeIN128 neurons is due to the input they receive from SEZ, but in line 453, they suggest it is due to axo-axonal connections. However, their evidence does not support one factor over the other. Overall, only the axo-axonal connection was strongly suggested in the discussion. The authors could clarify that the delay of SeIN128 activity may also be caused by multisynaptic connections involving SEZ or other neurons in the last section of the Discussion.

      Although SeIN128 primarily receives inputs from the SEZ, it also receives inputs within the VNC from Basin-2 (Figure 4 – figure supplement 2). Specifically, in the VNC, the axons of SeIN128 make inhibitory synaptic contacts onto the axon of Basin-2, which in turn makes reciprocal excitatory contacts onto the axon of SeIN128, thereby forming a feedback loop. However, by the time we wrote the original discussion, we had inadvertently focused on the potential of the negative feedback loop formed by these axo-axonal synapses in the VNC to mediate the slow response of SeIN128, overlooking the possibility that other as yet unidentified pathways could convey Basin or A00c activity indirectly to SeIN128 dendrites in the SEZ. Therefore, we have revised the original text, which read “These data suggest that the main synaptic inputs onto SeIN128 neurons in the SEZ mediate the slow responses upon activation of Basins or A00c neurons” to “These data suggest that the delay of SeIN128 activity may be caused by multi-synaptic connections involving the SEZ or a feedback loop involving axo-axonal connections between SeIN128 and Basin-2 or A00c” (revised, Lines 259 and 261). Accordingly, we have also adjusted the relevant discussion section to be consistent with this change (Lines 460 and 466).

      (2) Please clarify the following: How does the algorithm define rolling and crawling? Healthy larvae complete 360{degree sign} rolls, in each roll they rotate from dorsal up to dorsal up. It is possible that a larva rolls for an incomplete cycle and straightens up. Does the algorithm simply label individual frames as “roll”, “non-roll”, or “unknown”, and defines rolling by the existence of “roll” frames? If so, then larvae that rolled for 90{degree sign} and straightened would be counted as “rolling” though they failed to complete a full rolling bout. Also, how were “hunch” “turn” and “back” identified? Lastly, is there any manual quality control involved? Address this and related issues in the methods:

      a)  Expand the description of the classifier algorithm.

      b)  How are rolling and non-rolling animals defined in the "rolling%" assay? Were all "rolling" animals able to do at least one 360{degree sign} roll?

      c)  How are "rolling duration" and "end of 1st rolling" defined? Is the algorithm able to distinguish different rolling bouts? In these two assays, were the animals rolled for <1 second (in total or their "first roll") able to complete a 360{degree sign} roll?

      The Multi-worm Tracker (MWT) records only the contours of animals (no real video image data). Thus, the data fed into the classifier algorithm only includes features based on contour time-series data. The algorism uses movement perpendicular to the body axis—the characteristic feature of larval rolling—to classify rollers and non-rollers. Although the algorithm cannot determine whether a rolling event involves a rotation of more than 360 degrees, we ensure that rolling events are at least 360 degrees by removing any events that are shorter than 0.2 s (the minimum time to complete a 360-degree roll).

      We have accordingly revised the section of “Behavior detection” relating to the behavior classification algorithm in the methods section as follows (Lines 600 to 620).

      “After extracting behavioral parameters from Choreography, we used an unsupervised machine learning behavior classification algorithm to detect and quantify the following behaviors: hunching (Hunch), headbending (Turn), stopping (Stop), and peristaltic crawling (Crawl) as previously reported (Masson et al., 2020). Escape rolling (Roll) was detected with a classifier developed using the Janelia Automatic Animal Behavior Annotator (JAABA) platform (Kabra et al., 2013; Ohyama et al., 2015). JAABA transforms the MWT tracking data into a collection of ‘per-frame’ behavioral parameters and regenerates 2D dorsal-view videos of the tracked larvae. Based on such videos, we defined rolling as a rotation around the body while the larva maintains a C-shape, which results in a movement perpendicular to larval body axis (Supplementary videos 1 and 2). Using this definition, we trained the algorithm in the JAABA platform by labeling ~10,000 randomly chosen frames as rolling or non-rolling to develop the rolling classifier. If a larva did not curl into a C-shape or move sideways, it was labeled as a “non-roller.” Every animal with at least one rolling event longer than 0.2 s in a given period was labeled as a “roller” (i.e., it was assumed to have rolled at least 360 degrees), based on the observation that when the start and end of rolling events were precisely measured, the algorithm could identify rolling events completed in 0.2 s.

      The rejection of false positives, especially at the beginning and the end of each rolling bout, enhanced accuracy. The algorithm integrated these training labels and parameters generated with Choreography in a time series, such as speed, crabspeed, and body curvature, to generate a score for rolling detection. Above a certain threshold, the classifier labeled the frame as rolling. This classifier, which has false negative and false positive rates of 7.4% and 7.8%, respectively (n = 102), was utilized to detect rolling in this paper.”

      Readability of text

      (1) I suggest giving the SS04185 line and SeIN128 neuron common names that are easier to remember and follow (after mentioning their full name once).

      We acknowledge the reviewer’s concerns. However, because SS04185 was initially named using the Janelia split-line pipeline, and SeIN128 was named independently in a more recent study (Ohyama et al., 2015), we have retained these designations in the present manuscript.

      Figures and figure legends

      (1) It would help if the authors could put visual representations of rolling and crawling, such as a cartoon larva performing the rolling-crawling switch, and still frames of rolling and crawling of real larvae, especially in Figure 1. Also, please consider including a video of rolling and crawling in real larvae (preferably comparing control and experimental groups).

      We appreciate the reviewer’s suggestion. We have added a cartoon of the behavioral sequence in Figure 1A, as well as a Figure 1 supplement video based on MWT data, which shows rolling followed by crawling. 

      (2) To give the reader a take-home message, it would help if the authors could make a simplified version of Figure 4A and put it at the end of the paper.

      We thank the reviewer for the suggestion. To assist the reader, we have added schematics depicting how the circuit may function in panel I of Figure 8.

      (3) In Figure 1A, add the text "activation " after the neuron names.

      We have added “Chrimson” following “Basins>” to the new Figure 1B (old Figure 1A) and other figures (Figure 1C and D, Figure 5A, Figure 6A, and figure supplements).

      (4) Figure 1G: a data point is misaligned (at the top of the graph). 

      We have aligned the data point accordingly.

      (5) Figure 1B can benefit from a better design. If possible, please separate the crawling speed into an independent graph (or at least use a different line shape to code for crawling speed and indicate it on the in-graph legend). Is the speed of Basin/SS04185 co-activation studied?

      We appreciate the reviewer’s suggestion. We have separated the plots for rolling and crawling speed into different panels (Figure 1C and D). As shown in Figure 1D, the crawling speed observed during coactivation of Basins and SS04185 was similar to that during activation of Basins alone.

      (6) Figure S1 uses a different color-coding scheme from Figure 1. I suggest making the color coding consistent between figures.

      We are grateful for the reviewer’s suggestion. We have adjusted the color-coding scheme accordingly.

      (7) Line 692 (Figure 2 legend), "Killer Zipper" is misspelled as "Kipper Zipper". Out of curiosity, is there a way to remove or reduce SS04185-DN expression in the same manner as SS04185-MB reduction?

      We have corrected the text in the legend for Figure 2. As for the reviewer’s question, we did attempt to reduce or abolish SS04185-DN expression with tsh-LexA and LexAop-Kip+ but found no effect. Other identified LexA constructs with SeIN128 expression, however, all showed SS04185-MB expression. Consequently, we could not use these constructs because they inhibit both SeIN128 and SS04185-DN.

      (8) The color coding of Figure 2 (especially in D) makes it hard to distinguish between the brown and red groups.

      We thank the reviewer for the suggestion. Accordingly, we have changed the color for the brown group to orange.

      (9) In line 926 (Figure S2 legends), the description of F and G seems inverted.

      We appreciate the reviewer for pointing out the error. We have revised the text from “(F) has only SS04185-

      MB expression, and (G) has both SS04185-DN and SS04185-MB expression” to “(F) has both SS04185DN and SS04185-MB expression, and (G) has only SS04185-MB expression.”

      (10) Figure 7B: which line does the top group of asterisks belong to?

      The top group of asterisks indicates that each experimental group differs significantly (p < 0.001) from the control group. We have revised the figure to clarify the comparisons indicated by the asterisks in Figure 7B, as well as the figure legend below (Line 890-894).

      “(B) Cumulative plot of rolling duration. Statistics: Kruskal-Wallis test: H = 69.52, p < 0.001; Bonferronicorrected Mann-Whitney test, p < 0.001 between control and the GABA-B-R11, GABA-B-R12 and GABAB-R2 RNAi groups, p < 0.001 between GABA-A-R and all other experimental RNAi group. Sample size for the colored bars from top (control, black) to bottom (GABA-A-R, red); n = 520, 488, 387, 582, 306.”

      (11) Figure S8 D and F: indicate Basin-2 or Basin-4 activation on graph.

      We have revised Figure 8 – figure supplement D and F accordingly.

      Reviewer #3 (Recommendations For The Authors):

      (1) Lines 86-87: Text needs to be rewritten for clarity. Also, include the genotype in the corresponding figure legend (Figure 1B).

      We thank the reviewer for pointing this out. We have clarified the text accordingly and included the genotype in the figure legend (lines 86 and 87). Specifically, we have revised Figure 1B (New Figure 1C and D) and adjusted the legend accordingly as follows. 

      Lines 86 and 87: Crawling speed during the activation of all Basins following rolling was ~1.5 times that of the crawling speed at baseline (Figure 1D).

      (2) Include the protocol for heat shock-FLP out experiments

      We have added the following paragraph to the Methods section describing the heat shock-FlpOut experiments (lines 537 to 546).

      “Heat shock FlpOut mosaic expression

      First instar Drosophila larvae were exposed to heat shock in a water bath at 37°C for 12 min as previously described (Nern et al., 2015). With precise temporal and thermal control of heat shock, larvae with genotype

      w+, hs(KDRT.stop)FLP/13xLexAop2-IVS-CsChrimson::tdTomato; R54B01-Gal4.AD/72F11LexA;20xUAS-(FRT.stop)-CsChrimson::mVenus/R46E07-Gal4.DBD showed sporadic

      CsChrimson::mVenus expression driven by SS04185 split GAL4. As a result, the ratio of the larvae with SS04185-DN and SS04185-MB expression to those with only SS04185-MB expression was 1:1. Each larva was individually examined with optogenetic stimulation and behavior analysis. After behavioral experiments, mVenus expression in CNS was confirmed under the fluorescence microscope.”

      (3) In the immunohistochemistry, the authors exclude the steps for washings. Recommend the authors to cite the previous literature. Similar to the other protocols detailed in the methods.

      We have added a brief description of the steps involved in washing (lines 641 and 648). We have also provided a citation with similar immunohistology protocols (Patel, 1994).

      (4) Keeping the same Y-axis scale for similar graphical representation would be helpful to compare across different experimental conditions and genotypes-for example, 2E and 2H for the start of the first crawl.

      As suggested by the reviewer, we have adjusted the y-axis scales for Figure 2E and H to be identical.

      (5) The color schematics used for the graph make it hard to visualize the data. The author might reconsider the better presentation of the data by avoiding darker colors.

      We thank the reviewer for the constructive suggestion. We have lightened the shading of all violin plots. We have also modified the shading for the middle group in Figure 2C and E from dark brown to orange.

      (6) Co-activation of the SS04185 and Basins in the figures represented as Basins+SS04185 (Figure 1A) and SS04185 (rest of the figures). Authors might reconsider this terminology to define and distinguish the coactivation of SS04185 and Basins neurons from the activation of SS04185 or Basins alone. It needs to be clarified in the figures.

      We have adjusted the terminology by including “Basins>Chrimson” in all panels in which Basin neurons are optogenetically activated to trigger rolling in the background for all groups. Additionally, we have labeled the control group as “Control” and the experimental group as ”SS04185”. 

      (7) Figure 4A, summarizes the synaptic connection and strength between different neurons - SeIN128, Basins, A00c and mdIV. However, the nature of these synaptic connections - excitatory and inhibitory- is not represented. Based on the previous and current studies, the authors consider providing the schematic for circuit mechanisms of escape behavior sequences in larvae. Also, discussing these findings in light of the downstream output circuit and motor regulation might be informative (See Cooney et al. 2023, PNAS).

      As the reviewer correctly points out, the diagram of the connectome shown in Figure 4A does not indicate whether the connections are excitatory or inhibitory. Accordingly, we have added a new summary panel (Figure 8I) based on the results of examining GABAergic synapses (Figure 5A). The schematics in Figure 8I depict how the joint activity of inhibitory and excitatory synapses (indicated by arrowheads and blunt ends, respectively) may lead to rolling or fast crawling.

      We have also added a section discussing the premotor circuits for crawling and rolling premotor circuit in discussion (Line 512 – 519).

      (8) Percentage rolling present in figure 5B and 6A correspond to the control larvae 13xLexAop2-IVS-CsChrimson::mVenus; R72F11-lexA/+; HMS02355/+ and 13xLexAop2-IVS- Cs-Chrimson::mVenus; R72F11-lexA/+; UAS-TeTxLC.tnt/+. How does the author interpret the observed variability across the experiments? The author might consider discussing the genetic background effect on the observed behaviors, if any.

      As pointed out by the reviewer, we noticed that rolling probability varied depending on genetic background. We have revised the text accordingly (Lines 277 to 280).

      (9) Recheck the arrowheads in Figure 5A.

      We have confirmed the positions of the arrowheads in Figure 5A and modified the figures by outlining the cells with dotted lines.

      (10) Lines 295-298: Data presented in the supplementary figure and p-values in the text (p=0.11) suggest that the first crawl's onset is comparable to controls. Rewrite this text for clarity and include the statistical values in the supplemental figure 6.

      We have revised the text as follows (Lines 302 to 305).

      “Although the duration of each rolling bout, time to onset of the first rolling bout, and time to onset of the first crawling bout did not differ from those of controls (Figure 6–figure supplement 1D, E and G), the time to offset of the first rolling bout was delayed relative to controls (p = 0.013 for Figure 6–figure supplement 1F).”

      (11) Lines 263-264: Data provide evidence for SS04185 receiving inputs Basin-2 and A00c neurons. SS04185, which provides inputs to other neurons, specifically A00c neurons, but still needs clarification.

      We have revised the text as follows (Lines 264 to 266).

      The results thus far indicate that, activation of SeIN128 neurons inhibits rolling (Figure 1A–C), SeIN128 neurons receive functional inputs from Basin-2 and A00c (Figure 4A-C); and SeIN128 neurons make anatomical connections onto Basin-2 and A00c (Figure 4A). 

      (12) In the table that lists the genotypes, instead of '-' or the blank space in the label column, the author might consider using 'control,' consistent with the figures.

      In accord with the reviewer’s suggestion, we have revised the notation of ‘-’ or the blank space, to ‘control’ for all figures.

      (13) Check the typographical errors throughout the manuscript. Some below:

      We have revised the text accordingly as suggested below.

      a.  Lines 100, 142: SS4185 should be SS04185

      b.  Line 230: A00C should be A00c

      c.  Line 180: Expand VNC

      d.  10xUAS-IVS-mry::GFP should be 10xUAS-IVS-myr::GFP

      e.  Lines 444, 449: drosophila should be Drosophila

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      Horn and colleagues present data suggesting that the targeting of GREM1 has little impact on a mouse model of metabolic dysfunction-associated steatohepatitis. Importantly, they also challenge existing data on the detection of GREM1 by ELISA in serum or plasma by demonstrating that high-affinity binding of GREM1 to heparin would lead to localisation of GREM1 in the ECM or at the plasma membrane of cells.

      Strengths:

      This is an impressive tour-de-force study around the potential of targeting GREM1 in MASH.

      This paper will challenge many existing papers in the field around our ability to detect GREM1 in circulation, at least using antibody-mediated detection.

      Well-controlled, detailed studies like this are critically important in order to challenge less vigorous studies in the literature.

      The impressive volume of high-level, well-controlled data using an impressive range of in vitro biochemical techniques, rodent models, and human liver slices.

      We thank the reviewer for their time in assessing our manuscript and are very grateful for the positive response. Below, we give a point-by-point response to the reviewer’s comments and indicate where we plan to adjust the manuscript.

      Weaknesses: only minor.

      (1) The authors clearly show that heparin can limit the diffusion of GREM1 into the circulation-however, in a setting where GREM1 is produced in excess (e.g. cancer), could this "saturate" the available heparin and allow GREM1 to "escape" into the circulation?

      We thank the reviewer for their question. Indeed theoretically, if the production of Gremlin-1 exceeds the capacity of heparin to immobilise Gremlin-1, the protein may be released into solution and thus may enter the circulation. Whilst we have not addressed this possibility in our studies, we agree that it may be a mechanism worthwhile exploring in future studies.

      (2) Secondly, has the author considered that GREM1 be circulating bound to a chaperone protein like albumin which would reduce its reactivity with GREM1 detection antibodies?

      We have thought of the possibility that Gremlin would bind other proteins such as BMPs, and thereby mask assay-antibody epitopes. To minimise this possibility, we used antibody pairs which bind different epitopes. We also used LC-MS for Gremlin-1 detection (data not shown in the manuscript), a method that is not affected by epitope masking. With the LC-MS analysis we did not pick up any gremlin-signal in plasma. We will mention the LC-MS data in the updated manuscript.

      Also, we were able to detect circulating Gremlin-1 after treatment with anti-Gremlin-1 antibodies. As these were the same antibodies that were used in our assays, we should have not been able to detect Gremlin-1 if there had been a masking interaction with circulating high abundant plasma proteins such as albumin.

      Finally, we believe that the assay antibodies would outcompete binding of any other proteins because of their high affinity and very high concentrations used in the assays.

      In summary, we are very confident that Gremlin-1 is not present in circulation. We will though make some minor adjustments to the manuscript in order to stress this important point.

      (3) Statistics-there is no mention of blinding of samples-I assume this was done prior to analysis?

      All reported results were derived from hard quantitative readouts obtained through assays that are not liable to subjective interpretation. This also applies to immunohistochemistry and RNAscope histologic quantification, using Visiopharm Integrator System software ver. 8.4 or HALO v3.5.3577 (Area Quantification v2.4.2 module), respectively. Therefore, no blinding was necessary prior to analysis.

      (4) Line 211-I suggest adding the Figure reference at the end of this sentence to direct the reader to the relevant data.

      We thank the reviewer for the suggestion and will add a reference to Figure 1F here.

      (5) Figure 1E Y-axis units are a little hard to interpret-can integers be used?

      As the y axis in Figure 1E is on the logarithmic scale, integer numbers would be very hard to read because of the large range of numbers. As we acknowledge that the notation used may be difficult to read, we will change it to superscript scientific notation.

      (6) Did the authors attempt to detect GREM1 protein by IHC? There are published methods for this using the R&D Systems mouse antibody (PMID 31384391).

      Parallel to the work described in PMID 31384391 (Dutton et al., Oncotarget, 10: 4630-4639, 2019), we have tested a whole range of commercial and in-house gremlin-1 antibodies. We independently arrived at the same conclusion as Dutton et al namely that goat anti-gremlin antibody R&D Systems AF956 can stain the mouse or rat intestine in the muscularis layer and in the crypts/lower part of the villi, using FFPE sections. As per Dutton et al. we also corroborated this IHC staining by RNAscope - the mRNA was restricted to the muscularis and the connective tissue just below the crypts, suggesting that Gremlin-1 partially diffuses away from the cells that produce it. In contrast, none of the other commercial or in-house gremlin antibodies that we tested provided any useful staining on FFPE sections.

      We also used the R&D Systems AF956 antibody on several rat MASH liver samples. We saw little or no staining in livers from chow-fed rats, with only occasional weak staining around portal areas. Depending on the rat model, we saw from little or no staining to at most weak staining in portal areas and fibrotic areas. Among the various models tested, we observed the strongest staining in the rat CDAA-HFD+cholesterol model, in line with the ISH data.

      However, we were unable to establish IHC on human MASH liver samples using the R&D Systems AF956 antibody (or any other antibody) despite 98% sequence identity at the amino acid level between human and rat gremlin-1. Considering the results in Dutton et al. on rodent intestines, we tested the antibody on some human intestine samples, but the results on the available samples (inflamed appendices) were inconclusive.

      We will include representative IHC staining images for Gremlin-1 protein on rat livers as a Supplementary Figure and mention in the manuscript that IHC for human Gremlin-1 did not work with the available antibodies.

      (7) Did the authors ever observe GREM1 internalisation using their Atto-532 labelled GREM1?

      The Atto-532 Gremlin-1 cell association assay was mainly intended to visualise the association of Gremlin-1 with cell surface proteoglycans and how this interaction is affected by heparin-displacing and non-displacing antibodies. We observed a possible, but inconclusive intracellular association of Atto-532 Gremlin-1. However, this assay was not specifically designed for this purpose, and we did not follow up on this. Therefore, we cannot draw any conclusions on whether cell surface bound Gremlin-1 can be internalised. However, we appreciate that internalisation of Gremlin-1 would be an interesting biological mechanism worth following up in future studies.

      (8) Did the authors complete GREM1 ISH in the rat CDAA-HFD model? Was GREM1 upregulated, and if so, where?

      We have performed Grem1 ISH in the rat CDAA-HFD model and representative images of this are shown in Figure 1F. In chow-fed animals, Grem1 was expressed in a few cells in the portal tract, whereas after CDAA-HFD, Grem1 positive cells became more abundant in the portal tract and were also detectable in the fibrotic septa, as described in the respective results section. However, we performed no co-staining with other markers as we did for human liver samples.

      (9) Supplementary Figure 4C - why does the GFP level decrease in the GREM1 transgenic compared to control the GFP mouse? No such change is observed in Supplementary Figure 4E.

      In Supplementary Figure 4C we show expression of GFP mRNA and GREM1 mRNA in lysates of GFP-control and GREM1-GFP overexpressing LX-2 cells. The x-axis labels indicate the different lentiviruses. Therefore, the right panel in Supplementary Figure 4C shows that GREM1 overexpressing LX-2 cells expressed more GREM1 compared to GFP-control transduced LX-2, while GFP mRNA expression was comparable between the two.

      The results in Supplementary Figure 4E look different because – as can also be seen from the % of GFP+ cells in Supplementary Figure 4D – the GREM1 lentivirus here was more effective in transducing the cells, which is why both GFP and GREM1 mRNA were increased with GREM1 lentivirus compared to the GFP-only control. Unlike LX-2, the lentivirally transduced HHSC were not sorted on GFP positive cells prior to qPCR, which may explain the differences in GFP mRNA expression pattern between the two cell types.

      We acknowledge that the figure may be difficult to interpret and will adjust the figure annotation to improve on this.

      Reviewer #2 (Public Review):

      It is controversial whether liver gremlin-1 expression correlates with liver fibrosis in metabolic dysfunction-associated steatohepatitis (MASH). Horn et al. developed an anti-Gremlin-1 antibody in-house and tested its ability to neutralize gremlin-1 and treat liver fibrosis. This article has the advantage of testing its hypothesis with different animal and human liver fibrosis models and using a variety of research methodologies.

      The experimental design and results support the conclusion that the anti-gremlin-1 antibody had no therapeutic effect on treating liver fibrosis, so there are no other suggestions for new experiments:

      (1) The authors used RNAscope in situ hybridization to establish the correlation between Gremlin-1 expression and NMSH livers or cell lines.

      (2) A luminescent oxygen channelling immunoassay was used to measure circulating Gremlin-1 concentration. They found that Gremlin-1 binds to heparin very efficiently, preventing Gremlin-1 from entering circulation, and restricting Gremlin-1's ability to mediate organ cross-communication.

      (3) The authors developed a suitable NMSH rat model which is a choline-deficient, L-amino acid defined high fat 1% cholesterol diet (CDAA-HFD) fed rat model of NMSH, and created a selective anti-Gremlin-1 antibody which is heparin-displacing 0030:HD antibody. They also used human cirrhotic precision-cut liver slices to test their hypotheses. They demonstrated that neutralization of Gremlin-1 activity with monoclonal therapeutic antibodies does not reduce liver inflammation or liver fibrosis.

      One concern is that several reagents and assays are made in-house without external validation. Also, will those in-house reagents and assays be available to the science community?

      Overall this manuscript provides useful information that gremlin-1 has a limited role in liver fibrosis pathogenesis and treatment.

      We thank the reviewer for their time in assessing our manuscript and are very grateful for the positive response. We acknowledge the fact that most of our results were derived from assays using in-house generated reagents which will therefore be hard to reproduce externally. Whilst for legal reasons we cannot share the sequences of the monoclonal antibodies, we will be able to share aliquots with fellow scientists upon request. We will include a sentence to this end to the data availability statement.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of Knudsen-Palmer et al. was to define a biological set of rules that dictate the differential RNAi-mediated silencing of distinct target genes, motivated by facilitating the long-term development of effective RNAi-based drugs/therapeutics. To achieve this, the authors use a combination of computational modeling and RNAi function assays to reveal several criteria for effective RNAi-mediated silencing. This work provides insights into how (1) cis-regulatory elements influence the RNAi-mediated regulation of genes; (2) it is determined that genes can "recover" from RNAi-silencing signals in an animal; and 3) pUGylation occurs exclusively downstream of the dsRNA trigger sequence, suggesting 3º siRNAs are not produced. In addition, the authors show that the speed at which RNAi-silencing is triggered does not correlate with the longevity of the silencing. These insights are significant because they suggest that if we understand the rules by which RNAi pathways effectively silence genes with different transcription/processing levels then we can design more effective synthetic RNAi-based

      therapeutics targeting endogenous genes. The conclusions of this study are mostly supported by the data, but there are some aspects that need to be clarified.

      We thank the reviewer for their kind words and for appreciating the practical utility of our approach and discoveries. 

      (1) The methods do not describe the "aged RNAi plates feeding assay" in Figure 2E. The figure legend states that "aged RNAi plates" were used to trigger weaker RNAi, but the detail explaining the experiment is insufficient. How aged is aged? If the goal was to effectively reduce the dsRNA load available to the animals, why not quantitatively titrate the dsRNA provided? Were worms previously fed on the plates, or was simply a lawn of bacteria grown until presumably the IPTG on the plate was exhausted?

      We have elaborated our methods section to describe that the plates were left at 4ºC for about 4 months before adding bacteria and performing the assay, with one possible reason for the weaker knockdown being that perhaps the IPTG in the RNAi plates is less effective. However, it is worth noting that the robustness of a feeding RNAi assay can vary from culture to culture and/or batch of plates. We therefore always perform RNAi assays with wild-type animals alongside test strains to gauge the strength of the RNAi assay for a given culture and batch of plates. We called the data in Figure 2E “weak” because of the response of wild-type animals was weak as evidenced by weak twitching in levamisole. Despite this reduced effect, we observed 100% penetrance in wild-type animals, enabling us to sensitively detect the reduced responses of the mutants. 

      (2) Is the data presented in Figure 2F completed using the "aged RNAi plates" to achieve the partial silencing of dpy-7 observed? Clarification of this point would be helpful.

      No. The only occasion when plates were older was as in response to comment 1 above.

      (3) Throughout the manuscript the authors refer to "non-dividing cells" when discussing animals' ability to recover from RNA silencing. It is not clear what the authors specifically mean with the phrase "non-dividing cells", but as this is referred to in one of their major findings, it should be clarified. Do they mean the cells are somatic cells in aged animals, thus if they are "non-dividing" the siRNA pools within the cells cannot be diluted by cell division? Based on the methods, the animals of RNAi assays were L4/Young adults that were scored over 8 days after the initial pulse of dsRNA feeding. If this is the case, wouldn't these animals be growing into gravid adults after the feeding, and thus have dividing cells as they grew?

      We thank the reviewer for highlighting the need to explain this point further. Our experiment test the silencing of the unc-22 gene, which is expressed and functions in body-wall muscle cells. Most of the body wall muscles in C. elegans are developed by the L1 stage (reviewed in Krause and Liu, 2012), and they do not divide between the L4 and adult stages. Therefore, during the duration of the experiment where we delivered a pulse of dsRNA and examined responses over days, none of these cells divide. We have added a statement in the main text to explicitly say that the recovery from silencing by dsRNA that we observed cannot be explained by dilution during cell divisions.

      (4) What are the typical expression levels/turnover of unc-22 and bli-1? Based on the results from the altered cis-regulatory regions of bli-1 and unc-22 in Figure 5, it seems like the transcription/turnover rates of each of these genes could also be used as a proof of principle for testing the model proposed in Figure 4. The strength of the model would be further increased if the RNAi sensitivity of unc-22 reflects differences in its transcription/turnover rates compared to bli-1.

      We can get a sense of the relative abundances of unc-22 and bli-1 across development from the RNA-seq experiments that have been performed by others in the field (see below). However, these data cannot be used to infer either the production or the turnover rates. Future experiments that measure production (the combined rate of transcriptional run-on, splicing, export from the nucleus, etc.) will be required to define the production rates. Similarly, assays that detect the rate of degradation of transcripts without confounding presence from continued production will be needed to establish turnover rates. Future efforts to obtain values for these in vivo rates for multiple genes will help further test the model.

      Author response image 1.

      Expression data for unc-22:

      Author response image 2.

      Expression data for bli-1:

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Knudsen-Palmer et al. describes and models the contribution of MUT-16 and RDE-10 in the silencing through RNAi by the Argonaute protein NRDE-3 or others. The authors show that MUT-16 and RDE-10 constitute an intersecting network that can be redundant or not depending on the gene being targeted by RNAi. In addition, the authors provide evidence that increasing dsRNA processing can compensate for NRDE-3 mutants. Overall, the authors provide convincing evidence to understand the factors involved in RNAi in C. elegans by using a genetic approach.

      Major Strengths:

      The author's work presents a compelling case for understanding the intricacies of RNA interference (RNAi) within the model organism Caenorhabditis elegans through a meticulous genetic approach. By harnessing genetic manipulation, they delve into the role of MUT-16 and RDE-10 in RNAi, offering a nuanced understanding of the molecular mechanisms at play in two independent case study targets (unc-22 and bli-1).

      We thank the reviewer for their kind words and for appreciating our genetic analysis.

      Major Weaknesses:

      (1) It is unclear how the molecular mechanisms of amplification are different under the MUT-16 and RDE-10 branches of the regulatory pathway, since they are clearly distinct proteins structurally. It would be interesting to do some small-RNA-seq of products generated from unc-22 and bli-1, on wild-type conditions and some of the mutants studied (eg. mut-16, rde-10 and mut16 + rde-10). That would provide some insights into whether the products of the 2 amplifications are the same in all conditions, just changing in abundance, or whether they are distinct in sequence patterns.

      As we highlight in the paper, MUT-16 and RDE-10 are indeed very different proteins. One possible hypothesis suggested by this difference is that different kinds of small RNAs are made when the underlying mechanism relies on MUT-16 versus on RDE-10. However, postulating such a difference is not necessary for explaining the data. Furthermore, since the amounts of 2º siRNAs do not have to be correlated with the strength of silencing (Figure 4E), this work raises caution against the over-reliance on small RNA sequencing for inferring gene silencing. Nevertheless, it is indeed an attractive possibility that the amounts of small RNA, their distributions along mRNA sequence, and/or the sequence biases of the accumulating small RNAs could be different when relying on MUT-16- or RDE-10-dependent mechanisms. Future work that directly examine the small RNAs that accumulate in different mutant strains after initiating RNAi can shed light on these possibilities.

      (2) In the same line, Figure 5 aims to provide insights into the sequence determinants that influence the RNAi of bli-1. It is unclear whether the changes in transcript stability dictated by the 3'UTR are the sole factor governing the preference for the MUT-16 and RDE-10 branches of the regulatory pathway. In line with the mutant jam297, it might be interesting to test whether factors like codon optimality, splicing, ... of the ORF region upstream from bli-1-dsRNA can affect its sensitivity to the MUT-16 and RDE-10 branches of the regulatory pathway.

      In Figure 5, we eliminated the possibility that any gene that is transcribed using the bli-1 promoter would require NRDE-3, and showed using jam297 that modifications to the 3’ cis regulatory regions of a target can alter the dependence on NRDE-3 for knockdown. We agree that future experiments that control individual aspects of bli-1, potentially one feature at a time, can reveal the separate contributions of each characteristic of the gene to the observed dependence on NRDE-3 of the wild-type bli-1 gene. However, given the many ways that the same level of transcript knockdown can be achieved in our modeling (Figure 4 and its supplemental figures) we expect that multiple characteristics could contribute to NRDE-3 dependence. 

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) On page 5, the authors state that "MUT-16 and RDE-10 are redundantly or additively required for silencing unc-22"; however, based on their data in Figure 1D, it seems nearly 100% silencing of unc-22 is achieved in single mut-16 or rde-10 mutants. If this is the case, wouldn't it suggest that redundancy of MUT-16 and RDE-10, and not an "additive effect" of MUT-16 and RDE-10 function? Although, as the mutator complex nucleates around MUT-16, the data in Figure 1D suggests it is possible that the presence of MUT-16 or RDE-10 is sufficient for the recruitment of one or more factors that triggers the silencing of unc-22, and thus only one of these factors is necessary.

      Because we are seeing 100% silencing in wild-type, mut-16(-), or rde-10(-) animals in Figure 1D, this assay (where the silencing response is strong) does not allow us to discriminate between differing levels of silencing. The “weak” RNAi assay in Figure 2E provides the opportunity to observe differences in the contributions made by MUT-16 or RDE-10, supporting the idea that the 2º siRNAs and relative contributions to silencing can indeed be additive, explaining the complete loss of silencing only in the double mutant. While MUT-16 has been shown to be required for the recruitment of other Mutators in the germline, Mutator foci are not detectable in the soma. Given that unc-22 and bli-1 are somatic targets, we are hesitant to assume a mechanism for the production of small RNAs that requires a similar MUT-16-dependent nucleation in somatic cells. MUT-16 is clearly required for full silencing. But, if it functions similarly in the soma and the germline remains an open question. Indeed the mechanism(s) for producing small RNAs in somatic cells could be different from that used for production of small RNAs in the germline because of known differences in the use of RNA-dependent RNA polymerases (e.g. Ravikumar et al., Nucleic Acids Res. 2019). Future studies that determine the subcellular localization(s) and potential biochemical function(s) of RDE-10 and MUT-16 in somatic cells are needed to further delineate mechanisms.

      (2) On page 10, "rather than one that looks a frequency" - the "a" should be "at".

      We thank the reviewer and have fixed this typo. 

      (3) Figure 4 is very crowded, further dividing 4A (right) and 4B into subpanels would help the readability of the figure.

      We thank the reviewer for identifying these figures as being particularly crowded. These panels are presented as single units because the left and right portions of each panel are intimately connected. In Fig. 4A, the outline of mechanism deduced on the left is based on experiments at various scales shown on the right. We have now clarified this in the figure legend. In Fig. 4B, the equations on the right define and use the constants depicted on the left and the definitions below apply to both parts. We have now adjusted both figure parts to make these connections clearer. 

      (4) References to the subpanels of Figure 4 in the text on page 12 are off from the figure and figure legend.

      For example:

      "Overall, τkd and tkd were uncorrelated..." refers to 4C when it should refer to 4D. "However, the maximal amount of 2ºsiRNAs..." refers to 4D when it should refer to 4E. "Additionally, an increase in transcription..." refers to 4E when it should refer to 4F.

      "When a fixed amount of dsRNA was exposed..." refers to 4F when it should refer to 4G.

      We thank the reviewer for catching these errors and we have corrected these figure references.

      Reviewer #2 (Recommendations For The Authors):

      I would encourage the authors to follow up on some of the more mechanistic comments made above, that would strengthen and complement the genetic part of the work presented.

      We agree that additional work is needed to elucidate differences in molecular mechanisms for amplifying small RNAs in an MUT-16-dependent vs. RDE-10-dependent manner. We hope to address these extensions of our work in future manuscripts that focus on the biochemistry of these proteins and the populations of small RNAs generated using them.

      I appreciate the efforts to computationally model the dynamics of the system, but I am not sure that it helps that the mathematical modelling treats both branches of the pathway as functionally equals, since they could have some mechanistic specialisation that is not yet elucidated by the current work.

      Our assumption that both branches are equivalent is the most parsimonious. If we allowed for differences, even more values for the parameters of the model will agree with experimental data. The strength of the model is that despite such conservative assumptions, it agrees with experimental data. Biochemical elaborations that make the MUT-16 and RDE-10 branches qualitatively different could exist in vivo as suggested by the reviewer. Even with such qualitative differences in detail, the overall impact on gene silencing is a quantitative and additive one as demonstrated by our experiments. Future experimental work focused on biochemistry could elucidate how a Maelstrom domain-containing protein (RDE-10) and an intrinsically disordered protein (MUT-16) act differently to ultimately promote small RNA production.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      TMC7 knockout mice were generated by the authors and the phenotype was analyzed. They found that Tmc7 is localized to Golgi and is needed for acrosome biogenesis.

      Strengths:

      The phenotype of infertility is clear, and the results of TMC7 localization and the failed acrosome formation are highly reliable. In this respect, they made a significant discovery regarding spermatogenesis.

      In the original version, I pointed out the gap between their pH/calcium imaging data and the hypothesis of ion channel function of TMC7 in the Golgi. Now the author agrees and has changed the description to be reasonable. Additional experiments were also performed, and I can say that they have answered my concern adequately.

      I would say it is good to add any presumed mechanism for the observed changes in pH and calcium concentration in the cytoplasm this time.

      We appreciate your positive comments on our revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study presents a significant finding that enhances our understanding of spermatogenesis. TMC7 belongs to a family of transmembrane channel-like proteins (TMC1-8), primarily known for their role in the ear. Mutations to TMC1/2 are linked to deafness in humans and mice and were originally characterized as auditory mechanosensitive ion channels. However, the function of the other TMC family members remains poorly characterized. In this study, the authors begin to elucidate the function of TMC7 in acrosome biogenesis during spermatogenesis. Through analysis of transcriptomics datasets, they identify TMC7 as a transmembrane channel-like protein with elevated transcript levels in round spermatids in both mouse and human testis. They then generate Tmc7-/- mice and find that male mice exhibit smaller testes and complete infertility. Examination of different developmental stages reveals spermatogenesis defects, including reduced sperm count, elongated spermatids, and large vacuoles. Additionally, abnormal acrosome morphology is observed beginning at the early-stage Golgi phase, indicating TMC7's involvement in proacrosomal vesicle trafficking and fusion. They observed localization of TMC7 in the cis-Golgi and suggest that its presence is required for maintaining Golgi integrity, with Tmc7-/- leading to reduced intracellular Ca2+, elevated pH, and increased ROS levels, likely resulting in spermatid apoptosis. Overall, the work delineates a new function of TMC7 in spermatogenesis and the authors suggest that its ion channel activity is likely important for Golgi homeostasis. This work is of significant interest to the community and is of high quality.

      Strengths:

      The biggest strength of the paper is the phenotypic characterization of the TMC7-/- mouse model, which has clear acrosome biogenesis/spermatogenesis defects. This is the main claim of the paper and it is supported by the data that are presented.

      Weaknesses:

      The claim is that TMC7 functions as an ion channel. It is reasonable to assume this given what has been previously published on the more well-characterized TMCs (TMC1/2), but the data supporting this is preliminary here, and more needs to be done to solidify this hypothesis. The authors are careful in their interpretation and present this merely as a hypothesis supporting this idea.

      We appreciate this constructive suggestion.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Wang et al. have demonstrated that TMC7, a testis-enriched multipass transmembrane protein, is essential for male reproduction in mice. Tmc7 KO male mice are sterile due to reduced sperm count and abnormal sperm morphology. TMC7 co-localizes with GM130, a cis-Golgi marker, in round spermatids. The absence of TMC7 results in reduced levels of Golgi proteins, elevated abundance of ER stress markers, as well as changes of Ca2+ and pH levels in the KO testis. However, further confirmation is required because the analyses were performed with whole testis samples in spite of the differences in the germ cell composition in WT and KO testis. In addition, the causal relationships between the reported anomalies await thorough interrogation

      Strengths:

      By using PD21 testes, the revised assays have consolidated that depletion of TMC7 leads to a reduced level of Ca2+ and an elevated level of ROS in the male germ cells. The immunohistochemistry analyses have clearly indicated the reduced abundance of GM130, P115, and GRASP65 in the knockout testis.

      Weaknesses:

      The Discussion section contains sentences reiterating the Introduction and Results of this manuscript (e.g., Lines 79-85 and 231-236; Lines 175-179 and 259-263). Those read repetitive and can be removed.

      We thank the reviewer for this import comment. We have modified the text according to your suggestion.

      Future studies are required to decipher how TMC7 stabilizes Golgi structure, coordinates vesicle transport, and maintains the germ cell homeostasis.

      Thanks. We appreciate this constructive suggestion. We totally agree the reviewer that future studies are required to decipher how TMC7 stabilizes Golgi structure, coordinates vesicle transport, and maintains the germ cell homeostasis.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      1. In Fig S6d, the bar of Tmc7-/- is broken in the middle for P-EIF2.

      Thanks. We have remade Fig S6d according to your suggestion in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      None. The reviewers have adequately answered my points. Many thanks!

      We thank the reviewer for accepting our revisions as sufficient.

      Reviewer #3 (Recommendations For The Authors):

      In the revised manuscript, the authors have addressed most of my concerns.

      We are pleased that we were able to adequately address the reviewer’s concerns. We appreciate your suggestions to further improve our study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Summary:

      In this paper, the authors performed molecular dynamics (MD) simulations to investigate the molecular basis of the association of alpha-synuclein chains under molecular crowding and salt conditions. Aggregation of alpha-synuclein is linked to the pathogenesis of Parkinson's disease, and the liquid-liquid phase separation (LLPS) is considered to play an important role in the nucleation step of the alpha-synuclein aggregation. This paper re-tuned the Martini3 coarse-grained force field parameters, which allows long-timescale MD simulations of intrinsically disordered proteins with explicit solvent under diverse environmental perturbation. Their MD simulations showed that alpha-synuclein does not have a high LLPS-forming propensity, but the molecular crowding and salt addition tend to enhance the tendency of droplet formation and therefore modulate the alpha-synuclein aggregation. The MD simulation results also revealed important intra- and inter-molecule conformational features of the alpha-synuclein chains in the formed droplets and the key interactions responsible for the stability of the droplets. These MD simulation data add biophysical insights into the molecular mechanism underlying the association of alpha-synuclein chains, which is important for understanding the pathogenesis of Parkinson's disease.

      Strengths:

      (1) The re-parameterized Martini 3 coarse-grained force field enables the large-scale MD simulations of the intrinsically disordered proteins with explicit solvent, which will be useful for a more realistic description of the molecular basis of LLPS.

      (2) This paper showed that molecular crowding and salt contribute to the modulation of the LLPS through different means. The molecular crowding minimally affects surface tension, but adding salt increases surface tension. It is also interesting to show that the aggregation pathway involves the disruption of the intra-chain interactions arising from C-terminal regions, which potentially facilitates the formation of inter-chain interactions.

      We thank the reviewer for pointing out the strengths of our study.

      Weaknesses:

      (1) Although the authors emphasized the advantage of the Martini3 force field for its explicit description of solvent, the whole paper did not discuss the water's role in the aggregation and LLPS.

      We thank the reviewer for pointing this out. We agree that we have not explored or discussed the role of water in aS aggregation or LLPS. We would like to convey that we would like to explore that in detail in a separate study altogether. However we have updated the “Discussion” section with the following lines to convey to the readers the importance water plays in aggregation and LLPS of aS.

      Page 24: “The significance of the solvent in alpha-synuclein (αS) aggregation remains underexplored. Recent studies [26, 55] underscore the pivotal role of water as a solvent in LLPS. It suggests that comprehending the solvent’s role, particularly water, is essential for attaining a deeper grasp of the thermodynamic and physical aspects of αS LLPS and aggregation. By delving into the solvent’s contribution, researchers can uncover additional factors influencing αS aggregation. Such insights hold the potential to advance our comprehension of protein aggregation phenomena, crucial for devising strategies to address diseases linked to protein misfolding and aggregation, notably Parkinson’s disease. Future investigations focusing on elucidating the interplay between αS, solvent (especially water), and other environmental elements could yield valuable insights into the mechanisms underlying LLPS and aggregation. Ultimately, this could aid in the development of therapeutic interventions or preventive measures for Parkinson’s and related diseases.”

      (2) This paper discussed the effects of crowders and salt on the surface tension of the droplets.

      The calculation of the surface tension relies on the droplet shape. However, for the formed clusters in the MD simulations, the typical size is <10, which may be too small to rigorously define the droplet shape. As shown in previous work cited by this paper [Benayad et al., J. Chem. Theory Comput. 2021, 17, 525−537], the calculated surface tension becomes stable when the chain number is larger than 100.

      We appreciate the insightful feedback from the reviewer. However, we would like to emphasize that the αS droplets exhibit a highly liquid-like behavior, characterized by frequent exchanges of chains between the dense and dilute phases, alongside a slow aggregation process. In the study by Benayad et al. (2020, JCTC) [ref. 30], FUS-LCD was the protein of choice at concentrations in the (mM) range. FUS-LCD is known to undergo very rapid LLPS at concentrations lower than 100 (μM) where for αS the critical concentration for LLPS is 500 (μM) and undergoes slower aggregation than FUS. Moreover, the diffusion constant of αS inside newly formed droplets (no liquid to solid phase transition has occurred) has been estimated to be 0.23-0.58 μm2/s (Ray et al, 2020, Nat. Comm.). The value of diffusion constant for FUS-LCD inside LLPS droplets has been estimated to be 0.17 μm2/s (Murthy et al. 2023, Nat. Struct. and Mol. Biol.). These prove that αS forms droplets that are less viscous than that formed by FUS-LCD. This dynamic nature impedes the formation of large droplets in the simulations, making it challenging to rigorously calculate surface tension from interfacial width, which, in turn, necessitates the computation of g(r) between water and the droplet.

      Furthermore, it's essential to note that our primary aim in calculating surface tension was not to determine its absolute value. Rather, we aimed to compare surface tensions obtained for the three distinct environments explored in this study. Hence, our primary objective is to compare the distributions of surface tensions rather than focusing solely on the mean values obtained. The distributions shown in Figure 4a clearly show a trend which we have stated in the article.

      (3) In this work, the Martini 3 force field was modified by rescaling the LJ parameters \epsilon and \sigma with a common factor \lambda. It has not been very clearly described in the manuscript why these two different parameters can be rescaled by a common factor and why it is necessary to separately tune these two parameters, instead of just tuning the coefficient \epsilon as did in a previous work [Larsen et al., PLoS Comput Biol 16: e1007870].

      We thank the reviewer for the comment. We think that the distance of the first hydration layer also should have an impact on aggregation/LLPS. Here we are scaling both the epsilon and sigma. A higher epsilon of water-protein interactions mean higher the energy required for removal of water molecules (dehydration) when a chain goes from the dilute to the dense phase. A higher sigma on the other hand means that the hydration shell will also be at a larger distance making dehydration easier. Moreover, tuning both (either by same or different parameter) required a change of the overall protein-water interaction by only 1%, thereby requiring only considerably minimal change in forcefield parameters (compared to the case where only epsilon is being tuned which required 6-10% change in epsilon from its original values.) . Thus we think one of the ways of tuning water-protein interactions which requires minimal retuning of Martini 3 is by optimizing both epsilon and sigma. However whether a single scaling parameter is good enough requires further exploration and is outside the scope of the current study. More importantly it would introduce another free parameter into the system and the lesser the number of free parameters, the better. For this study, a single parameter sufficed as depicted in Figure 9. To inform the readers of why we chose to scale both sigma and epsilon, we have added the following in the main text:

      Page 25-26: “Increasing the ϵ value of water-protein interactions results in a higher energy demand for removing water molecules (dehydration) as a chain transitions from the dilute to the dense phase. Conversely, a higher σ value implies that the hydration shell will be at a greater distance, facilitating dehydration if a chain moves into the dilute phase. Therefore, adjusting water-protein interactions based on the protein’s single-chain behavior may not significantly influence the protein’s phase behavior. Furthermore, fine-tuning both ϵ and σ parameters only requires a minimal change in the overall protein-water interaction (1%). As a result, this adjustment minimally alters the force field parameters.”

      (4) Both the sizes and volume fractions of the crowders can affect the protein association. It will be interesting to perform MD simulations by adding crowders with various sizes and volume fractions. In addition, in this work, the crowders were modelled by fullerenes, which contribute to protein aggregation mainly by entropic means as discussed in the manuscript. It is not very clear how the crowder effect is sensitive to the chemical nature of the crowders (e.g., inert crowders with excluded volume effect or crowders with non-specific attractive interactions with proteins, etc) and therefore the force field parameters.

      We thank the reviewer for a potential future direction. In this investigation our main focus was to simulate the inertness features of crowders only, to ensure that only entropic effect of the crowders are explored. Although this study focuses on the factors that enable aS to form an aggregates/LLPS under different environmental conditions, it would be interesting to explore in a systematic way the mechanism of action of crowders of varying shapes, sizes and interactions. Therefore we added the following lines in the “Discussion” section to let the readers know that this is also a future prospect of investigation.

      Page 22: “Under physiological conditions, crowding effects emerge prominently. While crowders are commonly perceived to be inert, as has been considered in this investigation, the morphology, dimensions, and chemical interactions of crowding agents with αS in both dilute and dense phases may potentially exert considerable influence on its LLPS. Hence, a comprehensive understanding through systematic exploration is another avenue that warrants extensive investigation.”

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure S1. The title of the figure and the description in the figure caption are inconsistent?

      We thank the reviewer for the comment and we have updated the article with the correct caption.

      (2) Page 14, line 3, the authors may want to provide more descriptions of the "ms1", "ms2", and "ms3" for better understanding.

      We are grateful to the reviewer for pointing this out. We have added a line describing in brief what “ms1”, “ms2” and “ms3” represent. It reads “Subsequent to the investigation, we utilize three representative conformations, each corresponding to one of the macrostates. We designate these macrostates as 1 (ms1), 2 (ms2), and 3 (ms3) (Figure S7)” (Page 28)

      (3) Page 20, the authors may want to briefly explain how the normalized Shannon entropy was calculated.

      We thank the reviewer for pointing this out. This is plain Shannon Entropy and the word “normalized” should not have been there. To avoid confusion we have provided the equation we have used to calculate the Shannon entropy (Eq 8) (Page 21).

      Reviewer #2 (Public Review):

      In the manuscript "Modulation of α-Synuclein Aggregation Amid Diverse Environmental Perturbation", Wasim et al describe coarse-grained molecular dynamics (cgMD) simulations of α-Synuclein (αS) at several concentrations and in the presence of molecular crowding agents or high salt. They begin by bench-marking their cgMD against all-atom simulations by Shaw. They then carry 2.4-4.3 µs cgMD simulations under the above-noted conditions and analyze the data in terms of protein structure, interaction network analysis, and extrapolated fluid mechanics properties. This is an interesting study because a molecular scale understanding of protein droplets is currently lacking, but I have a number of concerns about how it is currently executed and presented.

      We thank the reviewer for finding our study interesting.

      (1) It is not clear whether the simulations have reached a steady state. If they have not, it invalidates many of their analysis methods and conclusions.

      We have used the last 1 μs (1.5-2.5 1 μs) from each simulation for further analysis in this study. To understand whether the simulations have reached steady state or not, we plot the time profile of the concentration of the protein in the dilute phase for all three cases.

      Author response image 1.

      Except for the scenario of only αS (Figures a and b), the rest show very steady concentrations across various sections of the trajectory (Figures c-f). The larger sudden fluctuations observed inFigures a and b are due to the fact that only αS undergo very slow spontaneous aggregation and owing to the fact that the dense phase itself is very fluxional, addition/removal of a few chains to/from the dense to dilute phase register themselves as large fluctuations in the protein concentration in the dilute phase. For the other two scenarios (Figures c-f) aggregation has been accelerated due to the presence of crowders/salt. This causes larger aggregates to be formed. Therefore addition/removal of one or two chains does not significantly affect the concentration and we do not see such sudden large jumps. In summary, the large jumps seen in Figures a and b are due to slow, fluxional aggregation of pure αS and finite size effects. However as these still are only fluctuations, we posit that the systems have reached steady states. This claim is further supported by the following figure where the time profile of a few useful system wide macroscopic properties show no change between 1.5-2.5 µs.

      We also have added a brief discussion in the Methods section (Page 29-30) with these figures in the Supplementary Information.

      Author response image 2.

      “In this study, we utilized the final 1 µs from each simulation for further analysis. To ascertain whether the simulations have achieved a steady state, we plotted the time profile of protein concentration in the dilute phase for all three cases. Except for minor intermittent fluctuation involving only αS in neat water (Figures S8a and S8b), the remaining cases exhibit notably stable concentrations throughout various segments of the trajectory (Figures S8 c-f). The relatively higher fluctuations observed in Figures S8a and b stem from the slow, spontaneous aggregation of αS alone, compounded by the inherently ambiguous nature of the dense phase.

      Consequently, the addition or removal of a few chains from the dense to the dilute phase results in significant fluctuations in protein concentration within the dilute phase. Conversely, in the other two scenarios (Figures S8c-f), aggregation is expedited by the presence of crowders/salt, leading to the formation of larger aggregates. Consequently, the addition or removal of one or two chains has negligible impact on concentration, thereby mitigating sudden large jumps. In summary, the conspicuous jumps depicted in Figures S8a and b arise from the gradual, fluctuating aggregation of pure αS and finite size effects. However, since these remain within the realm of fluctuations, we assert that the systems have indeed reached steady states. This assertion is bolstered by the subsequent figure, where the time profile of several pertinent system-wide macroscopic properties reveals no discernible change between 1.5-2.5 µs (Figures S9).”

      (2) The benchmarking used to validate their cgMD methods is very minimal and fails to utilize a large amount of available all-atom simulation and experimental data.

      We disagree with the reviewer on this point. We have cited multiple previous studies [26, 27] that have chosen Rg as a metric of choice for benchmarking coarse-grained model and have used a reference (experimental or otherwise) to tune Martini force fields. Majority of the notable literature where Rg was used as a benchmark during generation of new coarse-grained force fields are works by Dignon et al. (PLoS Comp. Biol.) [ref. 25], Regy et al (Protein Science. 2021) [ref. 26], Joseph et al.(Nature Computational Science. 2021) [ref. 27] and Tesei et al (Open Research Europe, 2022) [ref. 28]. From a polymer physics perspective, tuning water-protein interactions is simply changing the solvent characteristics for the biopolymer and Rg has been generally considered a suitable metric in the case of coarse-grained model. Moreover we try to match the distribution of the Rg rather than only the mean value. This suggests that at a single molecule level, the cgMD simulations at the optimum water of water-protein interactions would allow the protein to sample the conformations present in the reference ensemble. We use the extensively sampled 70 μs all-atom data from DE Shaw Research to obtain the reference Rg distribution. Also we perform a cross validation by comparing the fraction of bound states in all-atom and cgMD dimer simulations which also seem to corroborate well with each other at optimum water-protein interactions. To let the readers understand the rationale behind choosing Rg we have added a section in the Methods section (Page 25) that explains why Rg is plausibly a good metric for tuning water-protein interactions in Martini 3, at least when dealing with IDPs.

      Our optimized model is further supported by the FRET experiments by Ray et al. [6]. They found that interchain NAC-NAC interactions drive LLPS. Residue level contact maps obtained from our simulations also show decreased intrachain NAC-NAC interactions with an increased interchain NAC-NAC interactions inside the droplet. This corroborates well with the experimental observations and furthermore validates the metrics we have used for optimization of the water-protein interactions. However the comparison with the FRET data by Ray et al. was not present earlier and we have added the following lines in the updated draft.

      Page17: “Thus we observed that increased inter-chain NAC-NAC regions facilitate the formation of αS droplets which also have previously been seen from FRET experiments on αS LLPS

      droplets[6].”

      (3) They also miss opportunities to compare their simulations to experimental data on aSyn protein droplets.

      We thank the reviewer for pointing this out. We have tried to compare the results from our simulations to existing experimental FRET data on αS. Please see the previous response where we have described our comparison with FRET observations.

      (4) Aspects such as network analysis are not contextualized by comparison to other protein condensed phases.

      For a proper comparison between other protein condensed phases, we would require the position phase space of such condensates which is not readily available. Therefore we tried to explain it in a simpler manner to paint a picture of how αS forms an interconnecting network inside the droplet phase.

      (5) Data are not made available, which is an emerging standard in the field.

      We thank the reviewer for mentioning this. We have provided the trajectories between 1.5-2.5 μs, which we used for the analysis presented in the article, via a zenodo repository along with other relevant files related to the simulations (https://zenodo.org/records/10926368).

      Firstly, it is not clear that these systems are equilibrated or at a steady state (since protein droplets are not really equilibrium systems). The authors do not present any data showing time courses that indicate the system to be reaching a steady state. This is problematic for several of their data analysis procedures, but particularly in determining free energy of transfer between the condensed and dilute phases based on partitioning.

      We have addressed this concern as stated previously in the response. We have updated the article accordingly.

      Secondly, the benchmarking that they perform against the 73 µs all-atom simulation of aSyn monomer by Shaw and coworkers provides only very crude validation of their cgMD models based on reproducing Rg for the monomer. The authors should make more extensive comparisons to the specific conformations observed in the DE Shaw work. Shaw makes the entire trajectory publicly available. There are also a wealth of experimental data that could be used for validation with more molecular detail. See for example, NMR and FRET data used to benchmark Monte Carlo simulations of aSyn monomer (as well as extensive comparisons to the Shaw MD trajectory) in Ferrie at al: A Unified De Novo Approach for Predicting the Structures of Ordered and Disordered Proteins, J. Phys. Chem. B 124 5538-5548 (2020)

      DOI:10.1021/acs.jpcb.0c02924

      I note that NMR measurements of aSyn in liquid droplets are available from Vendruscolo: Observation of an α-synuclein liquid droplet state and its maturation into Lewy body-like assemblies, Journal of Molecular Cell Biology, Volume 13, Issue 4, April 2021, Pages 282-294, https://doi.org/10.1093/jmcb/mjaa075.

      In addition, there are FRET studies by Maji: Spectrally Resolved FRET Microscopy of α-Synuclein Phase-Separated Liquid Droplets, Methods Mol Biol 2023:2551:425-447. doi: 10.1007/978-1-0716-2597-2_27.

      So the authors are missing opportunities to better validate the simulations and place their structural understanding in greater context. This is just based on my own quick search, so I am sure that additional and possibly better experimental comparisons can be found.

      We have performed a comparison with existing FRET measurements by Ray et al. (2020) as discussed in a previous response and also updated the same in the article. The doi (10.1007/978-1-0716-2597-2_27) provided by the reviewer is however for a book on Methods to characterize protein aggregates and does not contain any information regarding the observations from FRET experiments. The other doi (https://doi.org/10.1093/jmcb/mjaa075) for the article from Vendrusculo group does not contain information directly relevant to this study. Moreover NMR measurements cannot be predicted from cgMD since full atomic resolution is lost upon coarse-graining of the protein . A past literature survey by the authors found very little scientific literature on molecular level characterization of αS LLPS droplets.

      Thirdly, the small word network analysis is interesting, but hard to contextualize. For instance, the 8 Å cutoff used seems arbitrary. How does changing the cutoff affect the value of S determined? Also, how does the value of S compare to other condensed phases like crystal packing or amyloid forms of aSyn?

      The 8 Å cutoff is actually arbitrary since a distance based clustering always requires a cutoff which is empirically decided. However 8 Å is quite large compared to other cutoffs used for distance based clustering. For example in ref 26, 5 Å was used as a cutoff for calculation of protein clusters. Larger cutoffs will lead to sparser network structures. However we used the same cutoff for all distance based clustering which makes the networks obtained comparable. We wanted to perform a comparison among the networks formed by αS under different environmental conditions.

      Fourthly, I see no statement on data availability. The emerging standard in the computational field is to make all data publicly available through Github or some similar mechanism.

      We thank the reviewer for pointing this out and we have provided the raw data between 1.5-2.5 μs for each scenario along with other relevant files via a zenodo repository (https://zenodo.org/records/10926368).

      Finally, on page 16, they discuss the interactions of aSyn(95-110), but the sequence that they give is too long (seeming to contain repeated characters, but also not accurate). aSyn(95-110) = VKKDQLGKNEEGAPQE. Presumably this is just a typo, but potentially raises concerns about the simulations (since without available data, one cannot check that the sequence is accurate) and data analysis elsewhere.

      This indeed is a typographical error. We have updated the article with the correct sequence. The validity of the simulations can be verified from the data we have shared via the zenodo repository (https://zenodo.org/records/10926368).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary:

      In this manuscript, Fister et. al. investigate how amputational and burn wounds affect sensory axonal damage and regeneration in a zebrafish model system. The authors discovered that burn injury results in increased peripheral axon damage and impaired regeneration. Convincing experiments show altered axonal morphology and increased Ca2+ fluxes as a result of burn damage. Further experimental proof supports that early removal of the burnt tissue by amputation rescues axonal damage. Burn damage was also shown to markedly increase keratinocyte migration and increase localized ROS production as measured by the dye Pfbsf. These responses could be inhibited by Arp 2/3 inhibition and isotonic treatment. 

      Strengths: 

      The authors use state-of-the-art methods to study and compare transection and burn-induced tissue damage. Multiple experimental approaches (morphology, Ca2+ fluxing, cell membrane labeling) confirm axonal damage and impaired regeneration time. Furthermore, the results are also accompanied by functional response tests of touch sensitivity. This is the first study to extend the role of tissue-damage-related osmotic exposure beyond wound closure and leukocyte migration to a novel layer of pathology: axonal damage and regeneration. 

      Weaknesses: 

      The conclusions of the paper claiming a link between burn-induced epithelial cell migration, spatial redox signaling, and sensory axon regeneration are mainly based on correlative observations. Arp 2/3 inhibition impairs cell migration but has no significant effect on axon regeneration and restoration of touch sensitivity. 

      We agree with the reviewer. We have tried many experiments to address this question. The data show that Arp 2/3 inhibition with CK666 is an effective way to inhibit initial keratinocyte migration. However, later migration still proceeds. What is interesting is that just inhibition of the early migration is sufficient to restore localized ROS production in the wound area in the first  hour post-burn, even if this is not sufficient to prevent ROS accumulation over time. There is also a trend toward improved sensory neuron function late after this early treatment. However, this is not statistically significant. We think it is likely that both migration and tissue scale ROS influence the regeneration defect of sensory neurons after burn. The data using isotonic solution supports this conclusion. We have tried many other ways to limit keratinocyte migration including depletion of talin and expression of a dominant negative Rac in basal epithelial cells, but these treatments were not compatible with survival of the fish after burn.

      Pharmacological or genetic approaches should be used to prove the role of ROS production by directly targeting the known H2O2 source in the system: DUOX. 

      We agree that pharmacologic or genetic approaches to directly manipulate ROS production would provide substantial support to the hypothesis that ROS, along with keratinocyte migration, is a main factor contributing to poor burn outcomes. To address this, we first tried using a morpholino to deplete DUOX. However, the combination of DUOX morpholino and burn injury was lethal to larvae. We also used pharmacologic inhibition of ROS production using DPI (Diphenyleneiodonium). With this treatment, ROS is inhibited for only the first hour post-burn as treatment is lethal for longer periods of time. Burned larvae have marginally improved axon density and touch sensitivity, suggesting the importance of ROS in burn outcomes, however it was not statistically significant. It is likely that an increased effect would be observed with longer treatment, but treatment for more than 1 hour was toxic. We have added a supplemental figure with this new DPI data.

      While the authors provide clear and compelling proof that osmotic responses lie at the heart of the burn-induced axonal damage responses, they did not consider the option of further exploring any biology related to osmotic cell swelling. Could osmotic ATP release maybe play a role through excitotoxicity? Could cPLA2 activation-dependent eicosanoid production relate to the process? Pharmacological tests using purinergic receptor inhibition or blockage of eicosanoid production could answer these questions. 

      We agree that the role of osmotic cell swelling in the burn response is an interesting avenue for future study. However, we make use of isotonic treatment in this study specifically for its effect on keratinocyte migration and broad-scale wound healing. As a result, we feel that pursuing the biology of this swelling phenomenon is outside the scope of this paper.

      The authors provide elegant experiments showing that early removal of the burnt tissue can rescue damage-induced axonal damage, which could also be interpreted in an osmotic manner: tail fin transections could close faster than burn wounds, allowing for lower hypotonic exposure time. Axonal damage and slow regeneration in tail fin burn wounds could be a direct consequence of extended exposure time to hypotonic water. 

      We have done experiments using FM dye to test how long it takes burn and transection wounds to close (shown below). In these experiments, dye entry into wounded tissue is used as a readout of wound closure. Dye is only able to enter wounded tissue when the epithelial barrier is disrupted. Our data reveal that transections take approximately 10 minutes to fully close, while burns take approximately 20 minutes to close.

      Author response image 1.

      To test if this difference in wound closure time would have an effect on axon outcomes, we repeated, but slightly modified, the dual-wound experiment. We increased the amount of time the burn condition was exposed to hypotonic conditions by 10 additional minutes (by transecting burned tissue at 15 minutes post burn, shortly before closure) and compared axon outcomes to the 5 mpw control transection. These results show there was no difference in axon regeneration or function when secondary transection was performed at 5 or 15 minutes post burn, suggesting that increased exposure to hypotonic solution is not the reason for defects in axon outcomes after burn injury.

      Author response image 2.

      Reviewer #2 (Public Review): 

      This is an interesting study in which the authors show that a thermal injury leads to extensive sensory axon damage and impaired regrowth compared to a mechanical transection injury. This correlates with increased keratinocyte migration. That migration is inhibited by CK666 drug treatment and isotonic medium. Both restrict ROS signalling to the wound edge. In addition, the isotonic medium also rescues the regrowth of sensory axons and recovery of sensory function. The findings may have implications for understanding non-optimal re-innervation of burn wounds in mammals. 

      The interpretation of results is generally cautious and controls are robust. 

      Here are some suggestions for additional discussion: 

      The study compares burn injury which produces a diffuse injury to a mechanical cut injury which produces focal damage. It would help the reader to give a definition of wound edge in the burn situation. Is the thermally injured tissue completely dead and is resorbed or do axons have to grow into damaged tissue? The two-cut model suggests the latter. Also giving timescales would help, e.g. when do axons grow in relation to keratinocyte movement? An introductory cartoon might help. 

      We thank the reviewer for these insightful comments and questions. The burn wound is defined as the area that is directly damaged as a result of increased heat (labeled by FM dye entry), and the burn wound edge as the first line of healthy cells adjacent to the burned cells. These definitions have been added to the text to clarify the areas referenced. Recent experiments lead us to believe the wound area is composed almost completely of dead cells, but we are currently working to discover the fate of these dead cells as well as the wound adjacent cells that migrate to the wound edge after burn. As a result, we do not know whether axons grow into damaged tissue or if the damaged tissue is extruded, but we do see growth cone formation within a few hours after wounding suggesting the axons are actively trying to regenerate after a burn.

      Could treatment with CK666 or isotonic solution influence sensory axons directly, or through other non-keratinocyte cell types, such as immune cells? 

      We have done experiments looking at the density of caudal fin innervation in CK666, isotonic, or DPI treated fins. The axon density is unchanged in all these treatments compared to control treated larvae, so we do not believe these treatments affect axon health homeostatically. These data have been added to supplemental figure 3. Additionally, one of the benefits of the larval zebrafish burn model is the simplicity of the system – the epidermis is primarily composed of sensory axons, mesenchymal cells and keratinocytes. The burn environment is proinflammatory so it does promote immune cell recruitment, but we do not believe the immune cells are interacting directly with sensory axons besides clearing axonal debris. Previous papers by our lab have shown that peak immune cell recruitment occurs at 6 hpw, but they localize to the damaged tissue in the burn area and not the wound edge.

      Reviewer #3 (Public Review): 

      Fister and colleagues use regeneration of the larval zebrafish caudal fin to compare the effects of two modes of tissue damage-transection and burn-on cutaneous sensory axon regeneration. The authors found that restoration of sensory axon density and function is delayed following burn injury compared to transection. 

      The authors hypothesized that thermal injury triggers signals within the wound microenvironment that impair sensory neuron regeneration. The authors identify differences in the responses of epithelial keratinocytes to the two modes of injury: keratinocytes migrate in response to burn but not transection. Inhibiting keratinocyte migration with the small-molecule inhibitor of Arp2/3 (CK666) resulted in decreased production of reactive oxygen species (ROS) at early, but not late, time points. Preventing keratinocyte migration by wounding in isotonic media resulted in increased sensory function 24 hours after burn. 

      Strengths of the study include the beautiful imaging and rigorous statistical approaches used by the authors. The ability to assess both axon density and axon function during regeneration is quite powerful. The touch assay adds a unique component to the paper and strengthens the argument that burns are more damaging to sensory structures and that different treatments help to ameliorate this. 

      A weakness of the study is the lack of genetic and cell-autonomous manipulations. Additional comparisons between transection and burns, in particular with manipulations that specifically modulate ROS generation or cell migration without potentially confounding effects on other cell types or processes would help to strengthen the manuscript.

      The use of genetic and cell-autonomous approaches would strengthen our study, however, we were unable to do this due to the lethality of these genetic approaches (or cell autonomous approaches). Basal epithelial migration is necessary for embryonic development. We attempted to circumvent this by generation of larvae transiently expressing a dominant-negative form of Rac, a protein crucial to the migratory process. The chimeric expression of the dominant negative Rac was either damaging to the larvae or the mosaicism was too low to observe any effects on migration phenotype.

      We also attempted a genetic approach to manipulate ROS production, as discussed above. We found that the DUOX morpholino was lethal to burned larvae. Finally, we attempted pharmacological inhibition of ROS production using the inhibitor DPI (Diphenyleneiodonium). With this treatment, burned larvae have marginally improved axon density and touch sensitivity, suggesting that dampening ROS may improve outcome. The DPI data have been added to the manuscript.

      In terms of framing their results, the authors refer to "sensory neurons" and "sensory axons" throughout the text - it should be made clear what type of neuron(s)/axon(s) are being visualized/assayed. Along these lines, a broader discussion of how burn injuries affect sensory function in other systems - and how the authors' results might inform our understanding of these injury responses - would be beneficial to the reader. 

      In summary, the authors have established a tractable vertebrate system to investigate different sensory axon wound healing outcomes in vivo that may ultimately allow for the identification of improved treatment strategies for human burn patients. Although the study implicates differences in keratinocyte migration and associated ROS production in sensory axon wound healing outcomes, the links between these processes could be more rigorously established. 

      The inconsistency between “neuron” and “axon” has been noted and the text has been corrected accordingly. “Neuron” is used when referring to the cell as a whole, while “axon” is used when referring to the sensory processes in the caudal fin. We added information about burn in the introduction as suggested: “While epithelial tissue is well adapted to repair from mechanical damage, burn wounds heal poorly. Thermal injury results in chronic pain and lack of sensation in the affected tissue, suggesting that an abnormal sensory neuron response contributes to burn wound pathophysiology.”

      We thank the reviewer’s for their comments.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      Suggested experiments: 

      (1) ROS measurements with the dye Pfbsf should be validated with more established ROS probes such as HyPer. 

      Pfbsf has been used previously as a readout of ROS production, and its use is documented in zebrafish (Maeda et al., Angew Chem Int Ed Engl, 2004, and Niethammer et al, Nature, 2009). These sources have been added as references when introducing Pfbsf to provide context for its use. The probe was validated and compared to HyPer in Niethammer’s 2009 paper. In our hands, we have used both probes and have similar results with tail transection.

      (2) To better support claims on ROS and H2O2 playing a central role in mediating axonal damage, the authors should consider pharmacological approaches such as rescue experiments with H2O2 and experiments using inhibitors such as DPI ar apocynin. 

      While the above reagents and drugs have limitations and non-specific side effects, more convincing proof could result from genetic approaches including experiments on DOUX knockdown or knockout lines. 

      To further dissect the role of ROS in the burn response, we conducted experiments using DPI, a potent ROS inhibitor that is well-documented in the literature. We found that 20 uM treatment of DPI (1 hour pretreatment, 1 hour post-burn) marginally improved axon density when quantified 24 hpw. Any higher dose, when in combination with a burn, proved to be lethal. Longer treatment with DPI was also not tolerated.

      In addition to experiments with DPI, we attempted to burn larvae that were injected with DUOX morpholino. The combined use of burn and DUOX MO was lethal. We have dampened the conclusions and include the new data with the DPI in the revised manuscript.

      Minor corrections: 

      (1)A phrase/expression in the abstract is confusing: isotonic treatment does not "induce osmotic regulation". Cells exposed to hypo- or hypertonicity will respond by regulatory volume decrease or increase, respectively. Isotonic treatment maintains homeostasis. 

      We appreciate this point and agree with the distinction. Revisions have been made in the text accordingly.

      (2) Figures 4E and 5E would be better to show as an average of multiple experiments with statistical significance. 

      The purpose of figures 4E and 5E are to demonstrate changes in fluorescence intensity and localization of ROS using the representative time series shown in 4D and 5D. The figure legend has been updated accordingly.

      Reviewer #2 (Recommendations For The Authors): 

      Figure 3D How can one distinguish between the two cellular elements that randomly meet or that there is actual coordination? Can the interactions be quantified? It is also unclear what the authors mean by "sensory neuron movement". The authors show that the neuronal cell bodies stay in their position, so only the axons change position. Do they do this by growth, i.e. the neuronal growth cones follow the keratinocytes or do keratinocytes displace the axon shafts? 

      We have included supplemental movies that address this question in the new uploaded document. Figure 3D is comprised of still images taken from supplemental movie 2, which is a timelapse of keratinocytes/axons moving together after a burn injury.  This movie clearly shows keratinocytes and their ensheathed axons moving simultaneously, so keratinocytes are mechanically pulling sensory axon shafts with them. We have revised the text to say axon movement, not sensory neuron movement.

      Over the time course of axonal movement (1 hour post-burn), it is not possible that neuronal growth cones contribute to movement, as this is too slow – previous work by other labs has shown that it takes several hours for axons to fully regenerate into amputated tissue, with movement not even noticeable until about 3 hours post-wound (Rieger and Sagasti, PLOS Biology, 2011).

      Regarding the second point, “neuron” vs. “axon” is an inconsistency in the text that has been corrected. “Neuron” is used when referring to the cell as a whole, “axon” is used when referring to the processes that innervate the caudal fin. The axons are physically pulled along with keratinocytes as they migrate after burn application. From our observations, growth cones appear closer to the wound site after the movement has stopped.

      Figure 4G It is surprising that the visual differences in the distribution of values are not statistically significant. 

      The distribution of values in 4G was large and that is why there is no statistically-significant difference – we were also surprised at this result. We did all statistics with a statistician and this included rigorous criteria for significance.

      Figure 4H The images seem to show a difference, whereas the quantification does not. I suggest choosing more representative images. 

      Figure 4H has been updated to include a more representative image of axon patterning with CK666 treatment.

      Figure 6A The text states that axon damage in the control and isotonic condition is comparable, yet in the image, it appears that the damage in the isotonic treatment at 0 hpw is more distal. 

      This is a good observation that we consistently see in isotonic-treated fish after burn. Axon damage localizes more proximally in isotonic-treated samples because the keratinocytes distal to the notochord are likely dead, and the axons innervating those cells are likely immediately destroyed upon burn application. As a result, the distal axons are not present to express GCaMP. We believe isotonic treatment allows keratinocytes to live slightly longer, so axon damage is therefore prevented for longer. This is also the focus of continuing work to further understand the burn microenvironment.

      Finally, the materials section could mention bias mitigation measures, e.g. withholding the treatment condition from the experimenter in the touch test. 

      We minimized bias in experiments whenever possible, and the conservative statistical measures that were applied to our data further reduce the likelihood of false significance.

      Reviewer #3 (Recommendations For The Authors): 

      - Line numbers would have facilitated reviewer feedback. 

      - Supplementary movies were missing in the submission. 

      The lack of supplementary movies upon submission was a mistake and the movies have been uploaded along with the revised manuscript.

      Introduction: 

      - Pg. 3: "In response to tissue damage, sensory neurons undergo rapid and localized axonal degeneration 4,5." Not sure reference 4 (Reyes et al) is appropriate here as this study was not in the context of tissue damage. 

      We have revised this section as suggested by the reviewer.

      Results: 

      - The expected expression pattern/localization of several transgenes was unclear. Please clearly state what cell type(s) each should label. For example, pg. 5 - "We next sought to further investigate sensory neuron function in burned tissue. For this, we assessed wound-induced axonal damage using zebrafish larvae that express the calcium probe GCaMP." Where is GCaMP expressed? 

      The manuscript has been updated to include expression patterns for the included transgenes – in this mentioned case, GCaMP is expressed in neurons under the pan-neuronal Elavl3 promoter.

      - Introducing the GCaMP labeling could use some clarification. Pg. 5 - "As shown previously by other groups, GCaMP labels degenerating neurons in real time35." This is confusing. Do the authors mean that GCaMP increases immediately prior to Wallerian degeneration as shown by Vargas et al. (PMID: 26558774)? 

      Sustained elevated calcium levels are associated with axon damage. Previous work from other labs has shown that calcium influx follows axon injury (Ziv and Spira, EJN 1993, Adalbert et al., Neuroscience 2012). In these experiments, whenever there are CGaMP-positive punctae, this indicates axon damage. We have revised the manuscript to address this critique.

      The Elavl3-GCaMP5 transgenic line will label when calcium levels increase in neurons. However, given the parameters used for imaging in our study (20x magnification, 100 ms exposure, and collection speed every 30 seconds for timelapses), we believe that only sufficiently large increases in calcium that are indicative of cell damage, and not physiological function, are being visualized.

      - Figure 1E - Are these panels images of the same fish? Please specify in the legend. 

      Figure 1E is comprised of one transected and one burned larva each, live-imaged over the course of six hours. The legend has been updated to include this information.

      - Figure 1F - How was the damage area measured? Consider doing this measurement over time to match Figure 1E. 

      Axon damage area measurements were performed similar to axon density measurements – maximum intensity z-projected confocal images of the caudal fin were generated using FIJI. For all experiments, the caudal fin area posterior to the notochord was outlined using the Polygon tool and measured to obtain a total surface area ROI. Axon fragments inside the outlined area were manually thresholded so all fragments posterior to the notochord were labeled and no saturated pixels were present, and an area measurement of these thresholded pixels was taken. We have added a section describing these measurements in the Methods section under “Axon damage quantification.”

      - Pg. 5 - When introducing the ngn1 MO - please state the expected phenotype and cite the appropriate background literature_._ 

      The ngn1 morpholino was cited in the Methods section with the appropriate literature (Cornell and Eisen, Development, 2002), from which we got the morpholino sequence. We thank the reviewer for pointing out the need for more introduction and clarification in the main text, so the ngn1 morpholino has been discussed in greater depth and cited in the main text as well using the same citation.

      - The two-wound model is an elegant approach but could be more clearly described in the main text. 

      An improved explanation of the two-wound experiment has been added to the text.

      - For Figure 3, it would be helpful to have a schematic of the anatomy illustrating the relative positions of axons and epidermal cell types. 

      - Figure 3C - should an additional control here be transected? Given that the krt4:lifeact transgene labels both layers of the epidermis, how were the superficial and basal keratinocytes separated? Interpretation of this section should be carefully worded. The authors state that "...suggesting that the superficial keratinocytes are being pulled by the motile basal keratinocytes" (pg.7 ) but isn't another possibility that the superficial cells are stationary? 

      It is correct that the krt4:lifeact transgene labels both layers of keratinocytes, which together span 20-30 microns. These layers were separated from the same z-stack collected by confocal imaging. The first z-slice and last z-slice of the same stack were separated using FIJI and pseudocolored to appear as different colors. This clarification has been added to the Methods.

      Prior observations with the krt4:lifeact and krt4:utrch (figure 3A) transgenic lines reveal that both keratinocyte layers will move distally after burn application.

      - Pg. 7 - "The axons of sensory neurons are ensheathed within actin-rich channels running through basal keratinocytes 50,51." ref 51 is a C. elegans paper which does not have basal keratinocytes.

      This was in error. The correct reference has replaced reference 51 (O’brien, J Comp. Neurol., 2012), in which electron microscopy is used to document the development of two layers of epithelial cells that also ensheath sensory neurons in a protective manner similar to glial cells in the central nervous system.

      - Figures S1E and F - the authors state that RB and DRG soma don't move. However, it was unclear from the figure panels and legend whether the authors imaged neurons that actually innervate the caudal fin (rather than some other region of the animal). Please clarify. For comparison, Fig S1F needs a pre-injury image to be meaningful. 

      The imaged cell bodies were those in the posterior trunk region, which are responsible for innervating the posterior sections of the fish including the caudal fin. From our observations, there was no movement of neuronal cell bodies after the burn.

      - Figure 5 title - can the authors clarify what aspect of this figure relates to "sustained epidermal damage" 

      The figure 5 title has been updated in response to the reviewer comments.

      - Figure 6 - is touch sensitivity really "restored" as the authors suggest? Alternatively, sensitivity may never be lost in isotonic treatment. Or the loss may be delayed? 

      We have modified the text accordingly by updating our phrasing – “restored” has been replaced with “improved” to indicate benefit over time.

      - Can the authors further disentangle the effects of keratinocyte migration, ROS, and isotonic treatment on axon regeneration? For example, would the addition of CK666 to the Isotonic +1 hpw treatment improve axon regeneration? Can the authors directly manipulate ROS signaling (e.g., through exogenous addition of H2O2 or duox1 MO) to alter regeneration outcomes in their wounding assays? 

      See the comments above.

      - Figure 6 title - consider removing or clarifying the word "excessive" here 

      The title has been revised according to the reviewer suggestion.

      - hpw vs hpb were used inconsistently throughout the text 

      The manuscript has been revised to use “hpw” when referring to the timeframe after injury application.

      Methods: 

      - Zebrafish transgenics are missing allele names 

      References: 

      - Many mistakes were noted in this section e.g., journal names missing, wrong authors, typos, DOIs misformatted 

      The references section has been corrected to use formatting consistent with APA citation and eLife preferred guidelines.

    1. Author response:

      Generals:

      We deeply appreciate the efforts by the Senior and Reviewing Editors, and also thank the three reviewers for their careful reading of the MS and their constructive comments, which are very helpful to improve our MS. We agree that we extend our efforts to elaborate the pharmacological analyses including clarification of the penetrance of GAP junction inhibitor(s), and effectiveness and specificity of the drugs. We plan to test at least L-type calcium channel blocker nifedipine. Concerning the reproducibility of the phenotypes, we indeed repeated experiments at multiple times for each of the analyses. While we demonstrated in the current version a series of representative data for simplicity along with explanation in the text that we conducted multiple times of experiments,  in a revised version we will improve the demonstration so that readers/reviewers can be convinced with the reproducibility of the data. We will also try to test other markers to look into cell types constituting the gut contractile organoid

      Specifics:

      Our provisional responses to “The weakness” raised by the reviewers are as follows:

      Reviewer #1:

      Please see the responses shown above (“Generals”).

      Reviewer #2:

      In addition to the responses in “Generals”, our response also includes the followings: We will look into wavelength between contractions/rhythm of the orgnaoid. We agree that our organoids derived from embryonic hind gut (E15) might not necessarily recapitulate the cell function in adult. However, it has well been accepted in the field of developmental biology that studies with embryonic tissue/cells make a huge contribution to unveil how complicated physiological cell function is underpinned. Nevertheless, we will carefully consider in the revised version so that the MS would not send misleading messages. Recent advances have also shown that 3D organoids can somehow “replace/substitute for” a complicated in vivo specimen when a particular cellular function is a focus of study.

      Reviewer #3:

      We appreciate a strong support of our findings.

      (1) We plan to perform positive control experiments, for example, to test if the drugs we use would interfere cardiac muscle functions.

      (2) We plan to do wach-out experiment to  confirm 10uM blebbistatin does not kill the cells. Thank you for this suggestion.

      (3) We plan to conduct tetrodotoxin treatment. Since experiments with such toxic reagents are not enouraged by our institute, we will perform experiments with a necessary-minimum amount.

      (4) We plant to address this point properly

      5) It is well predictable that blebbistatin would stop the gut movement in an explanted hindgut, and it is also well established that gut contractions (movements) are concomitant with Ca2+ transients. It would indeed be interesting to see how GJ inhibitors affect such in vivo gut movement. However, since all the reviewers and the Reviewing Editor pointed out, sensitivity (concentration) and penetrance of the drug is an important point of concern, we think that the in vivo analyses will be a next step to go in near future.

      (6) We have indeed noticed that contraction frequency is reduced after organoidal fusion. It seems as if cells communicate with each other to decide which rhythm they need to be adjusted to. Furthermore, contraction frequency tends to be slow down when the organoid becomes larger in size. It might be attributed to a delay in conductance between cells over growing distance. We plan to either quantify these potentially interesting phenomena or make a concise speculation in the revised version.

      (7)-(10) Thank you for these comments. We will fix them.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In the article by Dearlove et al., the authors present evidence in strong support of nucleotide ubiquitylation by DTX3L, suggesting it is a promiscuous E3 ligase with capacity to ubiquitylate ADP ribose and nucleotides. The authors include data to identify the likely site of attachment and the requirements for nucleotide modification. 

      While this discovery potentially reveals a whole new mechanism by which nucleotide function can be regulated in cells, there are some weaknesses that should be considered. Is there any evidence of nucleotide ubiquitylation occurring cells? It seems possible, but evidence in support of this would strengthen the manuscript. The NMR data could also be strengthened as the binding interface is not reported or mapped onto the structure/model, this seems of considerable interest given that highly related proteins do have the same activity. 

      The paper is for the most part well well-written and is potentially highly significant, but it could be strengthened as follows: 

      (1) The authors start out by showing DTX3L binding to nucleotides and ubiquitylation of ssRNA/DNA. While ubiquitylation is subsequently dissected and ascribed to the RD domains, the binding data is not followed up. Does the RD protein alone bind to the nucleotides? Further analysis of nucleotide binding is also relevant to the Discussion where the role of the KH domains is considered, but the binding properties of these alone have not been analysed. 

      We thank the reviewer for the suggestion. We have tested DTX3L RD for ssDNA binding using NMR (see Figure 4A and Figure S2), which showed that DTX3L RD binds ssDNA. We also tested the DTX3L KH domains for RNA/ssDNA binding using an FP experiment. However, the FP experiment did not show significant changes upon titrating RNA/ssDNA. It seems that the KH domains alone are not sufficient to bind RNA/ssDNA and both KH and RD domains are required for binding. Understanding how DTX3L binds RNA/ssDNA is an ongoing research in the lab. We will revise the Discussion on the KH domains.

      (2) With regard to the E3 ligase activity, can the authors account for the apparent decreased ubiquitylation activity of the 232-C protein in Figure 1/S1 compared to FL and RD? 

      We will address this question in the revision.

      (3) Was it possible to positively identify the link between Ub and ssDNA/RNA using mass spectrometry? This would overcome issues associated with labels blocking binding rather than modification. 

      We have tried to use mass spectrometry to detect the linkage between Ub and ssDNA/RNA, but was unable to do so. We suspect that the oxyester linkage might be labile, posing a challenge for mass spectrometry techniques. Similarly, a recent preprint from Ahel lab, which utilises LC-MS, detects the Ub-NMP product rather than the linkage (https://www.biorxiv.org/content/10.1101/2024.04.19.590267v1.full.pdf).

      (4) Furthermore, can a targeted MS approach be used to show that nucleotides are ubiquitylated in cells? 

      This will require future development and improvement of the MS approach, specifically the isolation of labile oxyester-linked products from cells and the optimisation of the MS detection method.

      (5) Do the authors have the assignments (even partial?) for DTX3L RD? In Figure 4 it would be helpful to identify the peaks that correspond to the residues at the proposed binding site. Also do the shifts map to a defined surface or do they suggest an extended site, particularly for the ssDNA.

      We only collected HSQC spectra which was insufficient for assignments. We have performed a competition experiment using ADPr and labelled ssDNA, showing that ADPr competes against the ubiquitination of ssDNA (Figure 4D). We will provide an additional experiment showing that ssDNA with a blocked 3’-OH can compete against ubiquitination of ADPr. These data, together with our NMR analysis, will further strengthen the evidence that ssDNA and ADPr compete the same binding pocket in DTX3L RD. Understanding how DTX3L RD binds ssDNA/RNA is an ongoing research in the lab.

      (6) Does sequence analysis help explain the specificity of activity for the family of proteins? 

      We will performed sequence alignment of DTX proteins RD domains and discuss this point in the revision.

      (7) While including a summary mechanism (Figure 5I) is helpful, the schematic included does not necessarily make it easier for the reader to appreciate the key findings of the manuscript or to account for the specificity of activity observed. While this figure could be modified, it might also be helpful to highlight the range of substrates that DTX3L can modify - nucleotide, ADPr, ADPr on nucleotides etc. 

      We will modify this Figure as suggested.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript by Dearlove et al. entitled "DTX3L ubiquitin ligase ubiquitinates single-stranded nucleic acids" reports a novel activity of a DELTEX E3 ligase family member, DTX3L, which can conjugate ubiquitin to the 3' hydroxyl of single-stranded oligonucleotides via an ester linkage. The findings that unmodified oligonucleotides can act as substrates for direct ubiquitylation and the identification of DTX3 as the enzyme capable of performing such oligonucleotide modification are novel, intriguing, and impactful because they represent a significant expansion of our view of the ubiquitin biology. The authors perform a detailed and diligent biochemical characterization of this novel activity, and key claims made in the article are well supported by experimental data. However, the studies leave room for some healthy skepticism about the physiological significance of the unique activity of DTX3 and DTX3L described by the authors because DTX3/DTX3L can also robustly attach ubiquitin to the ADP ribose moiety of NAD or ADP-ribosylated substrates. The study could be strengthened by a more direct and quantitative comparison between ubiquitylation of unmodified oligonucleotides by DTX3/DTX3L with the ubiquitylation of ADP-ribose, the activity that DTX3 and DTX3L share with the other members of the DELTEX family. 

      Strengths: 

      The manuscript reports a novel and exciting observation that ubiquitin can be directly attached to the 3' hydroxyl of unmodified, single-stranded oligonucleotides by DTX3L. The study builds on the extensive expertise and the impactful previous studies by the Huang laboratory of the DELTEX family of E3 ubiquitin ligases. The authors perform a detailed and diligent biochemical characterization of this novel activity, and all claims made in the article are well supported by experimental data. The manuscript is clearly written and easy to read, which further elevates the overall quality of submitted work. The findings are impactful and will help illuminate multiple avenues for future follow-up investigations that may help establish how this novel biochemical activity observed in vitro may contribute to the biological function of DTX3L. The authors demonstrate that the activity is unique to the DTX3/DTX3L members of the DELTEX family and show that the enzyme requires at least two single-stranded nucleotides at the 3' end of the oligonucleotide substrate and that the adenine nucleotide is preferred in the 3' position. Most notably, the authors describe a chimeric construct containing RING domain of DTX3L fused to the DTC domain DTX2, which displays robust NAD ubiquitylation, but lacks the ability to ubiquitylate unmodified oligonucleotides. This construct will be invaluable in the future cell-based studies of DTX3L biology that may help establish the physiological relevance of 3' ubiquitylation of nucleic acids. 

      Weaknesses: 

      The main weakness of the study is in the lack of direct evidence that the ubiquitylation of unmodified oligonucleotides reported by the authors plays any role in the biological function of DTX3L. The study leaves plenty of room for natural skepticism regarding the physiological relevance of the reported activity, because, akin to other DELTEX family members, DTX3 and DTX3L can also catalyze attachment of ubiquitin to NAD, ADP ribose and ADP-ribosylated substrates. Unfortunately, the study does not offer any quantitative comparison of the two distinct activities of the enzyme, which leaves plenty of room for doubt. One is left wondering, whether ubiquitylation of unmodified oligonucleotides is just a minor and artifactual side activity owing to the high concentration of the oligonucleotide substrates and E2~Ub conjugates present in the in-vitro conditions and the somewhat lower specificity of the DTX3 and DTX3L DTC domains (compared to DTX2 and other DELTEX family members) for ADP ribose over other adenine-containing substrates such as unmodified oligonucleotides, ADP/ATP/dADP/dATP, etc. The intriguing coincidence that DTX3L, which is the only DTX protein capable of ubiquitylating unmodified oligonucleotides, is also the only family member that contains nucleic acid interacting domains in the N-terminus, is suggestive but not compelling. A recently published DTX3L study by a competing laboratory (PMID: 38000390), which is not cited in the manuscript, suggests that ADP-ribose-modified nucleic acids could be the physiologically relevant substrates of DTX3L. That competing hypothesis appears more convincing than ubiquitylation of unmodified oligonucleotides because experiments in that study demonstrate that ubiquitylation of ADP-ribosylated oligos is quite robust in comparison to ubiquitylation of unmodified oligos, which is undetectable. It is possible that the unmodified oligonucleotides in the competing study did not have adenine in the 3' position, which may explain the apparent discrepancy between the two studies. In summary, a quantitative comparison of ubiquitylation of ADP ribose vs. unmodified oligonucleotides could strengthen the study. 

      We thank the reviewer for the constructive feedback. We agree that evidence for the biological function is lacking. While we have tried to detect Ub-ssDNA/RNA from cells, we found that Isolating and detecting labile oxyester-linked Ub-ssDNA/RNA products remain challenging due to (1) low levels of Ub-ssDNA/RNA products, (2) the presence of DUBs and nucleases that rapidly remove the products during the experiments, and (3) our lack of a suitable MS approach to detect the product. For these reasons, we feel that discovering the biological function will require future effort and expertise and is beyond the scope of our current manuscript.

      In the manuscript (PMID: 38000390), the authors used PARP10 to catalyse ADP-ribosylation onto 5’-phosphorylated ssDNA/RNA. They used the following sequences which lacks 3’-adenosine, which could explain the lack of ubiquitination.

      E15_5′P_RNA [Phos]GUGGCGCGGAGACUU

      E15_5′P_DNA [Phos]GTGGCGCGGAGACTT

      We will perform the experiment using this sequence to verify this. We have cited this manuscript but for some reasons, Pubmed has updated its published date from mid 2023 to Jan 2024. We will update the Endnote in the revised manuscript.

      We agree that it is crucial to compare ubiquitination of oligonucleotides and ADPr by DTX3L to find its preferred substrate. We have challenged oligonucleotide ubiquitination by adding excess ADPr and found that ADPr efficiently competes with oligonucleotide (Figure 4D). We will perform more thorough competition experiments by titrating with increasing molar excess of either ADPr or ssDNA to examine the effect on the ubiquitination of ssDNA and ADPr, respectively.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors):

      Inclusion of other catalase, peroxidase or superoxide dismutase gene promoters (with ChiP-seq screen shots) and whether they contain sntB binding sites is important to provide other potential downstream pathways controlling oxidative stress mediated regulation of development and aflatoxin metabolism. This can be presented as supplementary material.

      or

      Some more examples of ChiP-seq peaks in the promoters of nsdC, nsdD, sclR, steA, wetA, veA, fluG, sod2, catA, catC would strengthen the paper for the reliability of the ChiP-seq data. Currently, visualisation of the ChIP-seq data is only limited to catC gene promoter, where background ChIP-seq signals are very high (Figure 5F).

      The binding region and motif of SntB on the catA, catB, sod1, and sod2 genes were shown in Figure S7 and described in lane 531-536 and 881-884. The background of ChIP-seq signals is high, but the enrich level in the ip-sntB-HA samples is significant compared to IP-WT.

      Figure 5F, letters are too small, and difficult to read. The same is true for Figure 4. Letters should be enlarged for the readers to read it without problem.

      Thanks. We have revised the Figure 5F and Figure 4. Please see these Figures.

      Reviewer #2 (Recommendations For The Authors):

      The authors fully addressed my concerns and made appropriate changes in the manuscript. The quality of the manuscript is now improved.

      Thanks. We would like to express our sincere gratitude for your affirmation and thoughtful feedback. Your positive comments have been extremely encouraging and have strengthened my confidence in my work. Your time and effort in reviewing my submission are greatly appreciated.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Only one PITAR siRNA was tested in majority of the experiments, which compromises the validity of the results.

      We thank the reviewer for this comment. We have now used two siRNAs to demonstrate PITAR functions in various assays. In the revised manuscript, we carried out additional experiments with two siRNAs, and the results are presented in Figures 2C, D, F, G, H, I, and J; Figures 5A, B, Supplementary Figure 2B, C, D, E, and F.

      (2) Some results are inconsistent. For example, Fig 2G indicates that PITAR siRNA caused G1 arrest. However, PITAR overexpression in the same cell line did not show any effect on cell cycle progression in Fig 5I.

      The reason for the fact that PITAR silencing showed a robust G1 arrest, unlike PITAR overexpression, is as follows. Since glioma cells overexpress PITAR (which keeps the p53 suppressed), silencing PITAR (which will elevate p53 levels) in glioma cells shows a robust phenotype in cell cycle profile (in the form of increased G1 arrest). In contrast, the overexpression of PITAR in glioma cells fails to show robust changes in the cell cycle profile because glioma cells already have high levels of PITAR.

      (3) The conclusion that PITAR inactivates p53 through regulating TRIM28, which is highlighted in the title of the manuscript, is not supported by convincing results. Although the authors showed that a PITAR siRNA increased while PITAR overexpression decreased p53 level, the siRNA only marginally increased the stability of p53 (Fig 5E). The p53 ubiquitination level was barely affected by PITAR overexpression in Fig 5F.

      We disagree with the fact that PITAR silencing only marginally increased the stability of p53. In the cycloheximide experiment in Figure 5E, the half-life of p53 is increased by 60 % (50 mins to 120 mins), which is quite significant in altering the DNA damage response by p53. Further, we also want to point out that the other arm of p53 degradation by Mdm2 remains intact under these conditions. We also provide an improved p53 ubiquitination western blot in the revised version (Figure 5F). 

      (4) To convincingly demonstrate that PITAR regulates p53 through TRIM28, the authors need to show that this regulation is impaired/compromised in TRIM28-knockout conditions. The authors only showed that TRIM28 overexpression suppressed PITAR siRNA-induced increase of p53, which is not sufficient.

      We thank the reviewer. In the revised manuscript, we demonstrate that PITAR overexpression fails to inhibit p53 in TRIM28 silenced cells (Supplementary Figure 5G; Figure 5K, L, M, N).

      (5) Note that only one cell line was investigated in Fig 5.

      In revised manuscript, the impact of PITAR silencing and PITAR overexpression on p53 functions are demontsrared for one more glioma cell line (Supplemenatry Figure 5B, C, D, and E).

      (6) Another major weakness of this manuscript is that the authors did not provide any evidence indicating that the glioblastoma-promoting activities of PITAR were mediated by its regulation of p53 or TRIM28 (Fig 6 and Fig 7). Thus, the regulation of glioblastoma growth and the regulation of TRIM28/p53 appear to be disconnected.

      We would like to respectfully disagree with the reviewer on this particular point.  We have indeed provided the following evidence in the first version of the manuscript: glioblastoma-promoting activities of PITAR were mediated by its regulation of p53 or TRIM28.

      (1) To show the importance of p53:

      We show that PITAR silencing failed to inhibit the colony growth of p53-silenced U87 glioma cells (U87/shp53#1). We also show that while PITAR silencing decreased TRIM28 RNA levels in U87/shNT and U87/shp53#1 glioma cells, it failed to increase CDKN1A and MDM2 (p53 targets) at the RNA level in U87/shp53#1 cells unlike in U87/siNT cells (Supplementary Figure 6 Panels A, B, C, and D). 

      (2) To show the importance of TRIM28 and p53:

      The importance of p53 is also demonstrated in the context of patient-derived GSC lines. We demonstrate that PITAR silencing-induced reduction in the neurosphere growth (WT p53 containing patient-derived GSC line) is accompanied by a reduction in TRIM28 RNA and an increase in the CDKN1A RNA without a change in p53 RNA levels (Supplementary Figure 7 Panels A, B, C, D, and E). We also demonstrate that PITAR overexpression-induced neurosphere growth is accompanied by an increase in the TRIM28 RNA, and a decrease in CDKN1A RNA without a change in p53 RNA levels (Supplementary Figure 7 Panels F, G, H, and I). However, PITAR silencing failed to decrease neurosphere growth in mutant p53 containing GSC line (MGG8) (Supplementary Figure 7 Panels J, K, L, M, N, and F).

      (3) We show that the TRIM28 protein level is drastically reduced in small tumors formed by U87/siPITAR cells (Supplementary Figure 7 Panel E).

      (4) We show that glioma tumors formed by U87/PITAR OE cells express high levels of TRIM28 protein but reduced levels of p21 protein (Supplementary Figure 7 Panel B).

      Further, we did additional experiments to prove the importance of TRIM28.

      In the revised manuscript, we have carried out an additional experiment to prove the requirement of TRIM28 for tumor-promoting functions of PITAR overexpression. Earlier, we have shown that exogenous overexpression of PITAR promotes glioma tumor growth and imparts resistance to Temozolomide chemotherapy (Figure 7F and G; Supplementary Figure 9A and B). In the revised manuscript, we show that the tumor growth-promoting function of PITAR overexpression requires TRIM28. U87-Luc/PITAR OE cells formed a larger tumor compared to U87-Luc/VC cells (Figure 7H, and I; compare red line with blue line). U87-Luc/shTRIM28 cells formed very small-sized tumors (Figure 7H, and I; compare green line with blue line). Further, PITAR overexpression (U87-Luc/PITAR OE) was less efficient in promoting glioma tumor growth in TRIM28 silenced cells (Figure 7H, and I; compare pink line with red line). Thus, we prove that, as a whole, TRIM28 mediates the tumor growth-promoting functions of PITAR.

      (7) It is not clear what kind of message the authors tried to deliver in Fig 7F/G. Based on the authors' hypothesis, DNA-damaging agents like TMZ would induce PITAR to inactivate p53, which would compromise TMZ's anti-cancer activity. However, the data show that TMZ was very effective in the inhibition of U87 growth. The authors may need to test whether PITAR downregulation, which would increase p53 activity, have any effects on TMZ-insensitive tumors. Such results are more therapeutically relevant.

      Reviewer #1 rightly pointed out that TMZ induces PITAR expression, which should compromise TMZ's anti-cancer activity.

      We demonstrate the same as below:

      Figure 7F&G demonstrates the following two facts:1. PITAR overexpression increases the glioma-tumor growth (Figure 7G, compare red line with the blue line), 2. PITAR overexpressing glioma tumors are resistant to TMZ chemotherapy (Figure 7G, compare the pink line with the green line).

      In addition, Figure 7 F and G also demonstrate that TMZ treatment of tumors formed by U87/VC glioma cells inhibited the growth but not eliminated the tumor growth completely (compare pink line with blue line). We believe that the inability of TMZ to eliminate the tumor growth completely is because of the chemoresistance imparted by the DNA damage induced PITAR.

      Further, in Figure 2I, we indeed show that PITAR-silenced cells are more sensitive to TMZ and Adriamycin chemotherapy.

      (8) Lastly, the model presented in Fig 7H is confusing. It is not clear what the exact role of PITAR in the DNA damage response based on this model. If DNA damage would induce PITAR expression, this would lead to inactivation of p53 as revealed by this manuscript. However, DNA damage is known to activate p53. Do the authors want to imply that PITAR induction by DNA damage would help to bring down the p53 level at the end of DNA damage response? The presented data do not support this role unfortunately.

      We respect the views and questions raised by the reviewer.

      We would like explain as below the importance of our model.

      Yes, it is true that DNA damage induces p53. We show here that DNA damage also induces PITAR in a p53-independent manner, which, in turn, inhibits p53. Here is our explanation. Even though DNA damage activates p53, there exists an autoregulatory negative feedback loop that controls the extent and duration of p53 response to DNA damage (Wu et al., 1993; Haupt et al., 1997; Kubbutat, Jones and Vousden, 1997; Zhang et al., 2009).  It is proposed that the p53-Mdm2 feedback loop generates a “digital clock” that releases well-timed quanta of p53 until the damage is repaired or the cell dies (Lahave et al., 2004). In addition, it has also been shown that TRIM28, through its association with Mdm2, also contributes to p53 inactivation (Wang et al., 2005b; Czerwińska, Mazurek, and Wiznerowicz, 2017).

      Based on the above reports and our current work, we propose that DNA damage-induced PITAR, through its ability to increase the TRIM28 levels, contributes to the control of the DNA damage response of p53 along with Mdm-2. The difference is as follows: Since Mdm-2 is also a transcriptional target of p53, the p53-Mdm-2 axis is an autoregulatory negative feedback loop to control the DNA damage response by p53. In contrast, PITAR is not a transcriptional target of p53, and DNA damage-induced activation of PITAR is p53-independent. Hence, the PITAR-TRIM28 axis in controlling the DNA damage response of p53 creates an Incoherent feedforward regulatory network.  The experimental evidence provided in the revised manuscript is as follows: 1) We have already (the first version of the manuscript) shown that exogenous overexpression of PITAR significantly inhibits DNA damage-induced p53 (Figures 6A, B, C, and D). 2) In the revised manuscript, we show that the DNA damage response of p53 (duration and extent of p53 activation after a pulse of ionizing radiation) in PITAR-silenced cells follows similar kinetics in terms of duration, but the extent of p53 activation was much stronger (Supplementary figures 8H, I, J, and K).  This is because the TRIM28 component in TRIM28/Mdm-2 axis is compromised as PITAR silencing reduces the TRIM28 levels. 3) We also demonstrate that DNA damage-induced TRIM28 is dependent on PITAR (Figure 6K; Supplementary Figure 5G)

      Reviewer #1(Recommendations For The Authors):

      (1) Fig 7A, what is the explanation for the observation that tumors disappeared in most of the mice in the siPITAR group? Did the authors check if apoptosis was induced here?

      We agree to the point that the lack of tumor growth in the siPITAR group is likely due to the induction of apoptosis. We would like to point out that in vitro experiments indeed demonstrate that PITAR silencing induces apoptosis in Figure 2H and Supplementary Figure 2F.

      (2) The authors need to explain why Fig 6 used a cell line different from other experiments. It would be better to check other cell lines.

      The purpose of RG5 and MGG8 is as follows. 1) We wanted to establish the growth-promoting functions of PITAR in patient-derived GSC lines. 2) We also wanted to show the importance of WT p53 for the growth-promoting functions of PITAR.

      However, in the revised manuscript we moved this portion under the subsection “PITAR inhibits p53 protein levels by its association with TRIM28 mRNA“.

      Further,the experiments related to DNA damage induced activation of PITAR in p53-independent manner and its impact on DNA damage response by p53 is moved to a new section entitled “PITAR is induced by DNA damage in a p53-independent manner, which in turn diminishes the DNA damage response by p53”

      (3) It would be more convincing if the authors could test more p53 target genes in addition to p21.

      We thank the reviewer for this comment and the specific suggestions for checking additional p53 targets. In the revised manuscript, we have checked the MDM2 transcript levels in Supplementary Figure 6D. 

      Reviewer #2 (Recommendations For The Authors):

      (1) In the text, they mentioned " Figure 4J". There is no Figure 4J in Figure 4. It may be Figure 4K.

      We thank reviewer #2. We corrected this information in the revised manuscript.

      (2) The molecular weight markers in Western blots were missed in several Figure panels, including Figure 4J, Figure 5K, and Supple. Figure 3B, Supple. Figure 5G, H, Supple. Figures 6A and 7A.

      We thank reviewer #2, and we have included the molecular weight markers in all the mentioned figures.

    1. Author response:

      The following is the authors’ response to the current reviews. 

      eLife assessment:

      This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because of insufficient grounding in prior experimental results and insufficient consideration of alternative explanations. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala.

      We disagree with the overall assessment of our paper. The current reviews published below focus on two kinds of perceived inadequacies. Reviewer 1 (R1) was concerned that the fear conditioning paradigm used in the model is not compatible with some of the experiments we are modeling. The reviewer helpfully suggested in the Recommendations for the Authors some papers, which R1 believed exposed this incompatibility. In our reading, those data are indeed compatible with our hypotheses, as we will explain in our reply. Furthermore, the point raised by R1 is an issue for the entire field. We will suggest a solution to that issue based on published data.

      Reviewer 2 (R2) said that there is no evidence that the BLA is capable of producing, by itself, the rhythms that have been observed during fear conditioning in BLA and, furthermore, that the paper we cited to support such evidence, in fact, refutes our argument. We believe that the reasoning used by reviewer 2 is wrong and that the framework of R2 for what counts as evidence is inadequate. We spell out our arguments below in the reply to the reviewers.

      Finally, we believe this work is of interest far beyond investigators studying fear conditioning. The work shows how rhythms can create the timing necessary for spike-timing-dependent plasticity using multiple time scales that come from multiple different kinds of interneurons found both in BLA and, more broadly, in cortex. Thus, the work is relevant for all kinds of associative learning, not just fear conditioning. Furthermore, it is one of the first papers to show how rhythms can be central in mechanisms of higher-order cognition.

      Reviewer #1

      We thank Reviewer 1 for his kind remarks about our first set of responses and their understanding of the importance of the work. There was only one remaining point to be addressed:

      Deficient in this study is the construction of the afferent drive to the network, which does elicit activities that are consistent with those observed to similar stimuli. It still remains to be demonstrated that their mechanism promotes plasticity for training protocols that emulate the kinds of activities observed in the BLA during fear conditioning.

      It is true that some fear conditioning protocols involve non-overlapping US and CS, raising the question of how plasticity happens or whether behavioral effects may happen without plasticity. This is an issue for the entire field (Sun et al., F1000Research, 2020). Several papers (Quirk, Repa and LeDoux, 1995; Herry et al, 2007; Bordi and Ledoux 1992) show that the pips in auditory fear conditioning increase the activity of some BLA neurons: after an initial transient, the overall spike rate is still higher than baseline activity. The question remains as to whether the spiking is sustained long enough and at a high enough rate for STDP to take place when US is presented sometime after the stop of the CS.

      Experimental recordings cannot speak to the rate of spiking of BLA neurons during US due to recording interference from the shock. However, evidence seems to suggest that ECS activity should increase during the US due to the release of acetylcholine (ACh) from neurons in the basal forebrain (BF) (Rajebhosale et al., 2024). Pyramidal cells of the BLA robustly express M1 muscarinic ACh receptors (Muller et al., 2013; McDonald and Mott, 2021) and M1 receptors target spines receiving glutamatergic input (McDonald et al., 2019). Thus, ACh from BF should elicit a long-lasting depolarization in pyramidal cells. Indeed, the pairing of ACh with even low levels of spiking of BLA neurons results in a membrane depolarization that can last 7 – 10 s (Unal et al., 2015). This implies that the release of ACh can affect the consequences of the CS in successive trials. This should include higher spiking rates and more sustained activity in the ECS neurons after the first presentation of US, thus ensuring a concomitant activation of ECS and fear (F) neurons necessary for STDP to take place. Hence, we suggest that a solution to the problem raised by R1 may be solved by considering the role of ACh release by BF. To the best of our knowledge, there is nothing in the literature that contradicts this potential solution. The model we have may be considered a “minimal” model that puts in by hand the higher frequency due to the cholinergic drive without explicitly modeling it. As R1 says, it is important for us to give the motivation of that higher frequency; in the next revision, we will be explicit about how the needed adequate firing rate can come about without an overlap of CS and US in any given trial.

      Reviewer #2

      The authors of this study have investigated how oscillations may promote fear learning using a network model. They distinguished three types of rhythmic activities and implemented an STDP rule to the network aiming to understand the mechanisms underlying fear learning in the BLA.

      After the revision, the fundamental question, namely, whether the BLA networks can or cannot intrinsically generate any theta rhythms, is still unanswered. The author added this sentence to the revised version: "A recent experimental paper, (Antonoudiou et al., 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone." In the cited paper, the authors studied gamma oscillations, and when they applied 10 uM Gabazine to the BLA slices observed rhythmic oscillations at theta frequencies. 10 uM Gabazine does not reduce the GABA-A receptor-mediated inhibition but eliminates it, resulting in rhythmic populations burst driven solely by excitatory cells. Thus, the results by Antonoudiou et al., 2022 contrast with, and do not support, the present study, which claims that rhythmic oscillations in the BLA depend on the function of interneurons. Thus, there is still no convincing evidence that BLA circuits can intrinsically generate theta oscillations in intact brain or acute slices. If one extrapolates from the hippocampal studies, then this is not surprising, as the hippocampal theta depends on extra-hippocampal inputs, including, but not limited to the entorhinal afferents and medial septal projections (see Buzsaki, 2002). Similarly, respiratory related 4 Hz oscillations are also driven by extrinsic inputs. Therefore, at present, it is unclear which kind of physiologically relevant theta rhythm in the BLA networks has been modelled.

      Reviewer 2 (R2) says “the fundamental question, namely, whether the BLA networks can or cannot intrinsically generate any theta rhythms, is still unanswered.” In our revision, we cited (Antonoudiou et al., 2022), who showed that BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings. R2 pointed out that this paper produces such theta under conditions in which the inhibition is totally removed. R2 then states that the resulting rhythmic populations burst at theta “are driven solely by excitatory cells. Thus, the results by (Antonoudiou et al., 2022) contrast with, and do not support, the present study, which claims that rhythmic oscillations in the BLA depend on the function of interneurons. Thus, there is still no convincing evidence that BLA circuits can intrinsically generate theta oscillations in intact brain or acute slices.”

      This reasoning of R2 is faulty. With all GABAergic currents omitted, the LFP is composed of excitatory currents and intrinsic currents. Our model of the LFP includes all synaptic and membrane currents. In our model, the high theta comes from the spiking activity of the SOM cells, which increase their activity if the inhibition from VIP cells is removed. We are including a new simulation, which models the activity of the slice in the presence of kainate (as done in Antonoudiou et al., 2022), providing additional excitation to the network. If the BLA starts at high excitation, our model produces an ongoing gamma in the VIP cells that suppress SOM cells and allows a PING gamma to form between PV and F cells; with Gabazine (modeled as the removal of all the GABAergic synapses), this PING is no longer possible and so the gamma rhythm disappears. As expected, the simulation shows that the model produces theta with Gabazine; the model also shows that a PING rhythm is produced without Gabazine, and that this rhythm goes away with Gabazine because PING requires feedback inhibition (see Author response image 1). Thus, the theta increase with Gabazine in the (Antonoudiou et al., 2022) paper can be reproduced in our model, so that paper does support the model.

      Author response image 1.

      Spectral properties of the BLA network without (black) versus with Gabazine (magenta). Power spectra of the LFP proxy, which is the linear sum of AMPA, GABA (only present in the absence of Gabazine, D-, NaP-, and H-currents. Both power spectra are represented as mean and standard deviation across 10 network realizations. Bottom: inset between 35 and 50 Hz.

      Nevertheless, we agree that this paper alone is not sufficient evidence that the BLA can produce a low theta. We have recently learned of a new paper (Bratsch-Prince et al., 2024) that is directly related to the issue of whether the BLA by itself can produce low theta, and in what circumstances. In this study, intrinsic BLA theta is produced in slices with ACh stimulation (without needing external glutamate input) which, in vivo, would be produced by the basal forebrain (Rajebhosale et al., eLife, 2024) in response to salient stimuli. The low-theta depends on muscarinic activation of CCK interneurons, a group of interneurons that overlaps with the VIP neurons in our model (Krabbe 2017; Mascagni and McDonald, 2003).

      We suspect that the low theta produced in (Bratsch-Prince et al., 2024) is the same as the low theta in our model. We do not explicitly include ACh modulation of BLA in our paper, but in current work with experimentalists, we aim to show that ACh is essential to the theta by activating the BLA VIP cells. In our re-revised version, we will discuss Bratsch-Prince et al., 2024 and its connection to our hypothesis that the theta oscillations can be produced within the BLA.

      Note that we have already included a paragraph stating explicitly that our hypothesis in no way contradicts the idea that inputs to the BLA may include theta oscillations. Indeed, the following paragraphs in the revised paper describe the complexity of trying to understand the origin of brain rhythms in vivo. R2 did not appear to take this complexity, and the possible involvement of neuromodulation, into account in their current position that the theta rhythms cannot be produced intrinsically in the BLA.

      From revised paper: “Where the rhythms originate, and by what mechanisms. A recent experimental paper, (Antonoudiou et al. 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. They draw this conclusion in mice by removing the hippocampus, which can volume conduct to BLA, and noticing that other nearby brain structures did not display any oscillatory activity. Our model also supports the idea that intrinsic mechanisms in the BLA can support the generation of the low theta, high theta, and gamma rhythms.

      Although the BLA can produce these rhythms, this does not rule out that other brain structures also produce the same rhythms through different mechanisms, and these can be transmitted to the BLA. Specifically, it is known that the olfactory bulb produces and transmits the respiratory-related low theta (4 Hz) oscillations to the dorsomedial prefrontal cortex, where it organizes neural activity (Bagur et al., 2021). Thus, the respiratory-related low theta may be captured by BLA LFP because of volume conduction or through BLA extensive communications with the prefrontal cortex. Furthermore, high theta oscillations are known to be produced by the hippocampus during various brain functions and behavioral states, including during spatial exploration (Vanderwolf, 1969) and memory formation/retrieval (Raghavachari et al., 2001), which are both involved in fear conditioning. Similarly to the low theta rhythm, the hippocampal high theta can manifest in the BLA. It remains to understand how these other rhythms may interact with the ones described in our paper.”

      We believe our current paper is important to show how detailed biophysical modeling can unearth the functional implications of physiological details (such as the biophysical bases of rhythms), which are often (indeed, usually) ignored in models, and why rhythms may be essential to some cognitive processes (including STDP). Indeed, for evaluating our paper it is necessary to go back to the purpose of a model, especially one such as ours, which is “hypothesis/data driven”. The hypotheses of the model serve to illuminate the functional roles of the physiological details, giving meaning to the data. Of course, the hypotheses must be plausible, and we think that the discussion above easily clears that bar. Hypotheses should also be checked experimentally, and a model that explains the implications of a hypothesis, such as ours, provides motivation for doing the hard work of experimental testing. We think that R1 understands this and has been very helpful.

      —————

      The following is the authors’ response to the original reviews.

      eLife assessment

      This useful modeling study explores how the biophysical properties of interneuron subtypes in the basolateral amygdala enable them to produce nested oscillations whose interactions facilitate functions such as spike-timing-dependent plasticity. The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered. This work will be of interest to investigators studying circuit mechanisms of fear conditioning as well as rhythms in the basolateral amygdala. 

      Most of our comments below are intended to rebut the sentence: “The strength of evidence is currently viewed as incomplete because the relevance to plasticity induced by fear conditioning is viewed as insufficiently grounded in existing training protocols and prior experimental results, and alternative explanations are not sufficiently considered”. 

      We believe this work will be interesting to investigators interested in dynamics associated with plasticity, which goes beyond fear learning. It will also be of interest because of its emphasis on the interactions of multiple kinds of interneurons that produce dynamics used in plasticity, in the cortex (which has similar interneurons) as well as BLA. We note that the model has sufficiently detailed physiology to make many predictions that can be tested experimentally. Details are below in the answer to reviewers.

      Reviewer #1 (Public Comments):  

      (1) … the weakness is that their attempt to align with the experimental literature (specifically Krabbe et al. 2019) is performed inconsistently. Some connections between cell types were excluded without adequate justification (e.g. SOM+ to PV+). 

      In order to constrain our model, we focused on what is reported in (Krabbe et al., 2019) in terms of functional connectivity instead of structural connectivity. Thus, we included only those connections for which there was strong functional connectivity. For example, the SOM to PV connection is shown to be small (Krabbe et al., 2019, Supp. Fig. 4, panel t). We also omitted PV to SOM, PV to VIP, SOM to VIP, VIP to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning. 

      We reply with more details below to the Recommendations for the Authors, including new text.

      (2) The construction of the afferent drive to the network does not reflect the stimulus presentations that are given in fear conditioning tasks. For instance, the authors only used a single training trial, the conditioning stimulus was tonic instead of pulsed, the unconditioned stimulus duration was artificially extended in time, and its delivery overlapped with the neutral stimulus, instead of following its offset. These deviations undercut the applicability of their findings.  

      Regarding the use of a single long presentation of US rather than multiple presentations (i.e., multiple trials): in early versions of this paper, we did indeed use multiple presentations. We were told by experimental colleagues that the learning could be achieved in a single trial. We note that, if there are multiple presentations in our modeling, nothing changes; once the association between CS and US is learned, the conductance of the synapse is stable. Also, our model does not need a long period of US if there are multiple presentations.  

      We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like Poisson.

      Our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US coterminates with CS (Lindquist et al., 2004), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs existing in the literature, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect, as suggested in the Discussion of our paper, or by metabotropic effects as suggested above, or by the contribution from other brain regions. We will emphasize in our revision that the overlap in time, however instantiated, is a hypothesis of our model. It is hard to see how plasticity can occur without some memory trace of US. This is a consequence of our larger hypothesis that fear learning uses spiketiming-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature. 

      We reply with more details below to the Recommendations for the Authors, including new text.

      Reviewer #1 (Recommendations For The Authors): 

      Major points: 

      (1) This paper draws extensively from Krabbe et al. 2019, but it does not do so consistently. The paper would be strengthened if it tried to better match the circuit properties and activations.

      Specifically: 

      a. Krabbe found that PV interneurons were comparably activated by the US (see Supp Fig 1). Your model does not include that. The basis for the Krabbe 2019 claim that PV US responses are weaker is that they have a slightly larger proportion of cells inhibited by the US, but this is not especially compelling. In addition, their Fig 2 showed that VIP and SOM cells receive afferents from the same set of upstream regions. 

      b. The model excluded PV-SOM connections, but this does not agree with Krabbe et al. 2019, Table 2. PV cells % connectivity and IPSC amplitudes were comparable to those from VIP interneurons. 

      c. ECS to PV synapses are not included. This seems unlikely given the dense connectivity between PV interneurons and principal neurons in cortical circuits and the BLA (Woodruff and Sah 2007 give 38% connection probability in BLA). 

      We thank the Reviewer for raising these points, which allow us to clarify how we constrained our model and to do more simulations. Specifically: 

      a. (Wolff et al., Nature, 2014), cited by (Krabbe et al. 2018), reported that PV and SOM interneurons are on average inhibited by the US during the fear conditioning. However, we agree that (Krabbe et al., 2019) added to this by specifying that PV interneurons respond to both CS+ and US, although the fraction of US-inhibited PV interneurons is larger. As noted by the Reviewer, in the model we initially considered the PV interneurons responding only to CS+ (identified as “CS” in our manuscript). For the current revision, we ran new simulations in which the PV interneuron receives the US input, instead of CS+. It turned out that this did not affect the results, as shown in the figure below: all the network realizations learn the association between CS and fear. In the model, the PING rhythm between PV and F is the crucial component for establishing fine timing between ECS and F, which is necessary for learning. Having PV responding to the same input as F, i.e., US, facilitates their entrainment in PING and, thus, successful learning. 

      As for afferents of VIP and SOM from upstream regions, in (Krabbe et al., 2019) is reported that “[…] BLA SOM interneurons receive a different array of afferent innervation compared to that of VIP and PV interneurons, which might contribute to the differential activity patterns observed during fear learning.” Thus, in the model, we are agnostic about inputs to SOM interneurons; we modeled them to fire spontaneously at high theta.

      To address these points in the manuscript, we added some new text in what follows:

      (1) New Section “An alternative network configuration characterized by US input to PV, instead of CS, also learns the association between CS and fear” in the Supplementary information:

      “We constrained the BLA network in Fig. 2 with CS input to the PV interneuron, as reported in (Krabbe et al., 2018). However, (Krabbe et al., 2019) notes that a class of PV interneurons may be responding to US rather than CS. Fig. S3 presents the results obtained with this variation in the model (see Fig. 3 A,B for comparison) and shows that all the network realizations learn the association between CS and fear. In the model, the PING rhythm between PV and F is the crucial component for establishing fine timing between ECS and F, which is necessary for learning. Having PV responding to the same input as F, i.e., US, facilitates their entrainment in PING and, thus, successful fear learning.

      We model the VIP interneuron as affected by US; in addition, (Krabbe et al. 2019) reports that a substantial proportion of them is mildly activated by CS. Replacing the US by CS does not change the input to VIP cells, which is modeled by the same constant applied current. Thus, the VIP CS-induced activity is a bursting activity at low theta, similar to the one elicited by US in Fig. 2.”

      (2) Section “With the depression-dominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning” in Results: “Finally, since (Krabbe et al., 2019) reported that a fraction of PV interneurons are affected by US, we have also run the simulations for single neuron network with the PV interneuron affected by US instead of CS. In this case as well, all the network realizations are learners (see Fig. S3). ”

      (3) Section “Conditioned and unconditioned stimuli” in Materials and Methods: “To make Fig. S3, we also considered a variation of the model with PV interneurons affected by US, instead of CS, as reported in (Krabbe et al. 2019).”

      b. Re the SOM to PV connection: As reported in the reply to the public reviews, we considered the prominent functional connections reported in (Krabbe et al., 2019), instead of structural connections. That is, we included only those connections for which there was strong functional connectivity. For example, the SOM to PV connection is shown to be small (Supp. Fig. 4, panel t, in (Krabbe et al., 2019)). We also omitted PV to SOM, PV to VIP, SOM to VIP, and VIP to excitatory projection neurons; all of these are shown in (Krabbe et al. 2019, Fig. 3 (panel l), and Supp. Fig. 4 (panels m,t)) to have weak functional connectivity, at least in the context of fear conditioning.

      In order to clarify this point, in Section “Network connectivity and synaptic currents” in Materials and Methods, we now say:

      “We modeled the network connectivity as presented in Fig. 2B, derived from the prominent functional, instead of structural, connections reported in (Krabbe et al., 2019).”

      c. Re the ECS to PV synapses: We thank the Reviewer for the reference provided; as the Reviewer says, the ECS to PV synapses are not included. Upon adding this connection in our network, we found that, unlike the connection suggested in part a above, introducing these synapses would, in fact, change the outcome. Thus, the omission of this connection must be considered an implied hypothesis. Including those synapses with a significant strength would alter the PING rhythm created by the interactions between F and PV, which is crucial for ECS and F fine timing. Thanks very much for showing us that this needs to be said. Our hypothesis does not contradict the dense connections mentioned by the Reviewer; such dense connectivity does not mean that all pyramidal cells connect to all interneurons. This hypothesis may be taken as a prediction of the model.

      The absence of this connection is now discussed at the end of a new Section of the Discussion entitled “Assumptions and predictions of the model”, which reads as follows:

      “Finally, the model assumes the absence of significantly strong connections from the excitatory projection cells ECS to PV interneurons, unlike the ones from F to PV. Including those synapses would alter the PING rhythm created by the interactions between F and PV, which is crucial for ECS and F fine timing. We note that in (Woodruff and Sah, 2007) only 38% of the pyramidal cells are connected to PV cells. The functional identity of the connected pyramidal cells is unknown. Our model suggests that successful fear conditioning requires F to PV connections and that ECS to PV must be weak or absent.”

      (2) Krabbe et al. 2019 and Davis et al. 2017 were referenced for the construction of the conditioned and unconditioned stimulus pairing protocol. The Davis citation is not applicable here because that study was a contextual, not cued, fear conditioning paradigm. Regarding Krabbe, the pairing protocol was radically different from what the authors used. Their conditioned stimulus was a train of tone pips presented at 0.9 Hz, which lasted 30 s, after which the unconditioned stimulus was presented after tone offset. The authors should determine how their network behaves when this protocol is used. Also, note that basolateral amygdala responses to tone stimuli are primarily brief onset responses (e.g. Quirk, Armony, and LeDoux 1997), and not the tonic activation used in the model.  

      We replied to this point in our responses to the Reviewer’s Public Comments as follows:

      “We agree that, in order to implement the fear conditioning paradigm in our in-silico network, we made several assumptions about the nature of the CS and US inputs affecting the neurons in the BLA and the duration of these inputs. A Poisson spike train to the BLA is a signal that contains no structure that could influence the timing of the BLA output; hence, we used this as our CS input signal. We also note that the CS input can be of many forms in general fear conditioning (e.g., tone, light, odor), and we wished to de-emphasize the specific nature of the CS. The reference mentioned in the Recommendations for authors, (Quirk, Armony, and LeDoux 1997), uses pulses 2 seconds long. At the end of fear conditioning, the response to those pulses is brief. However, in the early stages of conditioning, the response goes on for as long as the figure shows. The authors do show the number of cells responding decreases from early to late training, which perhaps reflects increasing specificity over training. This feature is not currently in our model, but we look forward to thinking about how it might be incorporated. Regarding the CS pulsed protocol used in (Krabbe et al., 2019), it has been shown that intense inputs (6kHz and 12 kHz inputs) can lead to metabotropic effects that last much longer than the actual input (200 ms duration) (Whittington et al., Nature, 1995). Thus, the effective input to the BLA may indeed be more like

      Poisson.”

      Current answer to the Reviewer:

      There are several distinct issues raised by the Reviewer in the more detailed critique. We respectfully disagree that the model is not applicable to context-dependent fear learning where the context acts as a CS, though we should have been more explicit. Specifically, our CS input can describe both the cue and the context. We included the following text in the Results section “Interneuron rhythms provide the fine timing needed for depression-dominated STDP to make the association between CS and fear”:

      “In our simulations, the CS input describes either the context or the cue in contextual and cued fear conditioning, respectively. For the context, the input may come from the hippocampus or other non-sensory regions, but this does not affect its role as input in the model.”

      The second major issue is whether the specific training protocols used in the cited papers need to be exactly reproduced in the signals received by the elements of our model; we note that there are many transformations that can occur between the sensory input and the signals received by the BLA. In the case of auditory fear conditioning, a series of pips, rather than individual pips, are considered the CS (e.g., (Stujenske et al., 2014; Krabbe et al. 2019)). Our understanding is that a single pip does not elicit a fear response; a series of pips is required for fear learning. This indicates that it is not the neural code of a single pip that matters, but rather the signal entering the amygdala that incorporates any history-dependent signaling that could lead to spiking throughout the sequence of pips.  Also, as mentioned above, intense inputs at frequencies about 6kHz and 12kHz can lead to metabotropic effects that last much longer than each brief pip (~200 ms), thus possibly producing continuous activity in neurons encoding the input. Thus, we believe that our use of the Poisson spike train is reasonable. 

      However, we are aware that the activity of neurons encoding CS can be modulated by the pips: neurons encoding auditory CS display a higher firing rate when each pip is presented and a Poisson-like spike train between pips (Herry et al., Journal of Neuroscience, 2007). Here we confirm that potentiation is present even in the presence of the fast transient response elicited by the pips. We said in the original manuscript that there is learning for a Poisson spike train CS input at ~50 Hz; this describes the neuronal activity in between pips. For the revision, we asked whether learning is preserved when CS is characterized by higher frequencies, which would describe the CS during and right after each pip. We show in the new Fig. S4 that potentiation is ensured for a range of CS frequencies. The figure shows the learning speed as a function of CS and US frequencies. For all the CS frequencies considered, i) there is learning, ii) learning speed increases with CS frequency. Thus, potentiation is present even when pips elicit a faster transient response.

      To better specify this in the manuscript, 

      We added the following sentences in the Results section “With the depressiondominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning”: 

      “We note that the CS and US inputs modeled as independent Poisson spike trains represent stimuli with no structure. Although we have not explicitly modeled pulsating pips, as common in auditory fear conditioning (e.g., (Stujenske 2014; Krabbe 2019)), we show in Fig. S4 that potentiation can be achieved over a relatively wide range of gamma frequencies. This indicates that overall potentiation is ensured if the gamma frequency transiently increases after the pip.”

      We added the section “The full network potentiates for a range of CS frequencies“ and figure S4 in the Supplementary Information:

      We included in Materials and Methods “Conditioned and unconditioned stimuli” the following sentences:

      “Finally, for Fig.S4, we considered a range of frequencies for the CS stimulus. To generate the three Poisson spike trains with average frequencies from 48 to 64 Hz in Fig. S4, we set 𝜆 = 800, 1000, 1200.”

      Finally, to address the comment about the need for CS and US overlapping in time to instantiate fear association, we added the following text in the Results section “Assumptions and predictions of the model”:

      “Finally, our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning. Despite paradigms involving both overlapping (delay conditioning, where US co-terminates with CS (e.g., (Lindquist et al., 2004)), or immediately follows CS (e.g., Krabbe et al., 2019)) and non-overlapping (trace conditioning) CS/US inputs exist, we hypothesized that concomitant activity in CS- and US-encoding neuron activity should be crucial in both cases. This may be mediated by the memory effect due to metabotropic effects (Whittington et al., Nature, 1995) as suggested above, or by the contribution from other brain regions (see section “Involvement of other brain structures” in the Discussion). The fact that plasticity occurs with US memory trace is a consequence of our larger hypothesis that fear learning uses spike-timing-dependent plasticity; such a hypothesis about plasticity is common in the modeling literature.”

      (3) As best as I could tell, only a single training trial was used in this study. Fair enough, especially given that fear learning can occur with a single trial. However, most studies of amygdala fear conditioning have multiple trials (~5 or more). How does the model perform when multiple trials are given?  

      The association between CS and fear acquired after one trial, i.e., through a potentiated ECS to F connection, is preserved in the presence of multiple trials.  Indeed, the association would be weakened or erased (through depression of the ECS to F connection) only if ECS and F did not display good fine timing, i.e., F does not fire right after ECS most of the time. However, the implemented circuit supports the role of interneurons in providing the correct fine timing, thus preventing the association acquired from being erased.  

      In the second paragraph of the Results section “With the depression-dominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning”, we made the above point by adding the following text:

      “We note that once the association between CS and fear is acquired, subsequent presentations of CS and US do not weaken or erase it: the interneurons ensure the correct timing and pauses in ECS and F activity, which are conducive for potentiation.”

      (4) The LFP calculations are problematic. First, it is unclear how they were done. Did the authors just take the transmembrane currents they included and sum them, or were they scaled by distance from the 'electrode' and extracellular conductivity (as one would derive from the Laplace equation)? Presumably, the spatial arrangement of model neurons was neglected so distance was not a factor. 

      Second, if this is the case, then the argument for excluding GABAergic conductances seems flawed. If the spatial arrangement of neurons is relevant to whether to include or exclude GABAergic conductances, then wouldn't a simulation without any spatial structure not be subject to the concern of laminar vs. nuclear arrangement? 

      Moreover, to the best I can tell, the literature the authors use to justify the exclusion of

      GABAergic currents does not make the case for a lack of GABAergic contribution in non-laminar structures. Instead, those studies only argue that in a non-laminar structure, AMPA currents are detectable, not that GABA cannot be detected. Thus, the authors should either include the GABAergic currents when calculating their simulated LFP, or provide a substantially better argument or citation for their exclusion. 

      We thank the Reviewer for pointing this out; this comment helped us rethink how to model the LFP. The origin of the LFP signal in BLA has not been fully determined, but factors thought to be important include differences in the spatial extension of the arborization in excitatory and inhibitory neurons, in the number of synaptic boutons, and spatial distributions of somata and synapses (Lindén et al 2011; Łęski 2013; Mazzoni et al. 2015). In the first version of the manuscript, we excluded the GABAergic currents because it is typically assumed that they add very little to the extracellular field as the inhibitory reversal potential is close to the resting membrane potential. For the revision, we re-ran the simulations during pre and post fear conditioning and we modeled the LFP as the sum of the AMPA, GABA and NaP-/H-/D- currents. With this new version of the LFP, we added a new Fig. 6 showing that there is a significant increase in the low theta power, but not in the high theta power, with fear learning (Fig. 6 C, D, E). This increase in the low theta power was mainly due to the AMPA currents created by the newly established connection from ECS to F, which allowed F to be active after fear conditioning in response to CS. 

      However, as the Reviewer mentioned, our network has no spatial extent: neurons are modeled as point cells. Thus, our current model does not include the features necessary to model some central aspects of the LFP. Despite that, our model does clearly demonstrate how rhythmic activity in the spike timing of neurons within the network changes due to fear learning (Fig. 6B). The spiking outputs of the network are key components of the inputs to the LFP, and thus we expect the rhythms in the spiking to be reflected in more complex descriptions of the LFP. But we also discovered that different LFP proxies provide different changes in rhythmic activity comparing pre- and post-fear learning; although we have no principled way to choose a LFP proxy, we believe that the rhythmic firing is the essential finding of the model.

      We have added the following to the manuscript:

      (1) In the new version of Fig. 6, we present the power spectra of the network spiking activity (panel B), along with the power spectra of the LFP proxy that includes the GABA, AMPA, and NaP-/H-/D- currents (panels C, D, E). 

      (2) We modified the conclusion of the Results section entitled “Increased low-theta frequency is a biomarker of fear learning” by saying:

      “In this section, we explore how plasticity in the fear circuit affects the network dynamics, comparing after fear conditioning to before. We first show that fear conditioning leads to an increase in low theta frequency power of the network spiking activity compared to the pre-conditioned level (Fig. 6 A,B); there is no change in the high theta power. We also show that the LFP, modeled as the linear sum of all the AMPA, GABA, NaP-, D-, and H- currents in the network, similarly reveals a low theta power increase and no significant variation in the high theta power (Fig. 6 C,D,E). These results reproduce the experimental findings in (Davis et al., 2017), and (Davis et al., 2017), and Fig 6 F,G show that the low theta increase is due to added excitation provided by the new learned pathway. The additional unresponsive ECS and F cells in the network were included to ensure we had not biased the LFP towards excitation. Nevertheless, although both the AMPA and GABA currents contribute to the power increase in the low theta frequency range (Fig. 6F), the AMPA currents show a dramatic power increase relative to the baseline (the average power ratio of AMPA and GABA post- vs pre-conditioning across 20 network realizations is 3*103 and 4.6, respectively). This points to the AMPA currents as the major contributor to the low theta power increase. Specifically, the newly potentiated AMPA synapse from ECS to F ensures F is active after fear conditioning, thus generating strong currents in the PV cells to which it has strong connections (Fig. 6G). Finally, the increase in power is in the low theta range because ECS and F are allowed to spike only during the active phase of the low theta spiking VIP neurons. We have also explored another proxy for the LFP (see Supplementary Information and Fig. S6).”

      In the Supplementary Information, we included a figure and some text in the new section entitled “A higher low theta power increase emerges in LFP approximated with the sum of the absolute values of the currents compared to their linear sum”:

      “Given that our BLA network comprises a few neurons described as single-compartment cells with no spatial extension and location, the LFP cannot be computed directly from our model’s read-outs. In the main text, we choose as an LFP proxy the linear sum of the AMPA, GABA, and P-/H-/D-currents. We note that if the LFP is modeled as the sum of the absolute value of the currents, as suggested by (Mazzoni et al. 2008; Mazzoni et al. 2015), an even higher low theta power increase arises after fear conditioning compared to the linear sum. Differences in the power spectra also arise if other LFP proxies (e.g., only AMPA currents, only GABA currents) are considered. A principled description of an LFP proxy would require modeling the three-dimensional BLA anatomy, including that of the interneurons VIP and SOM; this is outside the scope of the current paper. (See (Feng et al. 2019) for a related project in the BLA.)”

      (3) We updated the Materials and Methods section “Local field potentials and spectral analysis” to explain how we compute the LFP in the revised manuscript: 

      “We considered as an LFP proxy as the linear sum of all the AMPA, GABA, NaP, D, and H currents in the network. The D-current is in the VIP interneurons, and NaP-current and H-current are in SOM interneurons.”

      Although it is beyond the scope of the current work, an exploration of the most accurate proxy of the LFP in the amygdala is warranted. Such a study could be accomplished by adopting a similar approach as in (Mazzoni et al., 2015), where several LFP proxies based on point-neuron leaky-integrate and fire neuronal network were compared with a “groundtruth” LFP obtained in an analogous realistic three-dimensional network model. 

      To explicitly mention this issue in the paper, we add a paragraph in the “Limitations and caveats” section in the Discussion, which reads as follows:

      “LFPs recorded in the experiments are thought to be mainly created by transmembrane currents in neurons located around the electrode and depend on several factors, including the morphology of the arborization of contributing neurons and the location of AMPA and GABA boutons (Katzner et al. 2009; Lindén et al 2011; Łęski 2013; Mazzoni et al. 2015). Since our model has no spatial extension, we used an LFP proxy; this proxy was shown to reflect the rhythmic output of the network, which we believe to be the essential result (for more details see Results “Increased low-theta frequency is a biomarker of fear learning”, and Supplementary Information “A higher low theta power increase emerges in LFP approximated with the sum of the absolute values of the currents compared to their linear sum”).”

      (4)     We have removed the section “Plasticity between fear neuron and VIP slows down overall potentiation” in Results and sections “Plasticity between the fear neuron (F) and VIP slows down overall potentiation” and “Plastic F to VIP connections further increase lowtheta frequency power after fear conditioning” in the Supplementary Information. This material is extraneous since we are using a new proxy for LFP.

      Minor points: 

      (1) In Figure 3C, the y-axis tick label for 0.037 is written as "0.37."

      We thank the reviewer for finding this typo; we fixed it.

      (2) Figure 5B is unclear. It seems to suggest that the added ECS and F neurons did not respond to either the CS or UCS. Is this true? If so, why include them in the model? How would their inclusion change the model behavior? 

      It is correct that the added ECS and F neurons did not respond to the CS or US (UCS); they are constructed to be firing at 11 Hz in the absence of any connections from other cells.  These cells were included to be part of our computation of the LFP.  Specifically, adding in those cells would make the LFP take inhibition into account more, and we wanted to make sure that were not biasing our computation away from the effects of inhibition.  As shown in the paper (Fig. 6B), even with inhibition onto these non-responsive cells, the LFP has the properties claimed in the paper concerning the changes in the low theta and high-theta power, because the LFP is dominated by new excitation rather than the inhibition. 

      First, in the Results section “Network with multiple heterogeneous neurons can establish the association between CS and fear”, we commented on the added ECS and F neurons that do not respond to either CS or US by saying the following:

      “The ECS cells not receiving CS are inhibited by ongoing PV activity during the disinhibition window (Fig. 5B); they are constructed to be firing at 11 Hz in the absence of any connections from other cells. The lack of activity in those cells during fear conditioning implies that there is no plasticity from those ECS cells to the active F. Those cells are included for the calculation of the LFP (see below in “Increased low-theta frequency is a biomarker of fear learning”.)”

      Furthermore, we add the following sentence in the Results section “Increased low-theta frequency is a biomarker of fear learning”: 

      “The additional unresponsive ECS and F cells in the network were included to ensure we had not biased the LFP towards excitation.”

      (3) Applied currents are given as current densities, but these are difficult to compare with current levels observed from whole-cell patch clamp recordings. Can the currents be given as absolute levels, in pA/nA. 

      In principle, it is possible to connect current densities with absolute levels, as requested. However, we note that the number of cells in models is orders of magnitude smaller than the number being modeled. It is common in modeling to adjust physiological parameters to achieve the qualitative properties that are important to the model, rather than trying to exactly match particular recordings.

      We added to the Methods description why we choose units per unit area, rather than absolute units. 

      “All the currents are expressed in units per area, rather than absolute units, to avoid making assumptions about the size of the neuron surface.”

      (4) Regarding: "We note that the presence of SOM cells is crucial for plasticity in our model since they help to produce the necessary pauses in the excitatory projection cell activity. However, the high theta rhythm they produce is not crucial to the plasticity: in our model, high theta or higher frequency rhythms in SOM cells are all conducive to associative fear learning. This opens the possibility that the high theta rhythm in the BLA mostly originates in the prefrontal cortex and/or the hippocampus (Stujenske et al., 2014, 2022)." The chain of reasoning in the above statement is unclear. The second sentence seems to be saying contradictory things. 

      We agree that the sentence was confusing; thank you for pointing it out. We have revised the paragraph to make our point clearer. The central points are: 1) having the SOM cells in the BLA is critical to the plasticity in the model, and 2) these cells may or may not be the source of the high theta observed in the BLA during fear learning.

      We deleted from the discussion the text reported by the Reviewer, and we added the following one to make this point clearer:

      “We note that the presence of SOM cells is crucial for plasticity in our model since they help to produce the necessary pauses in the excitatory projection cell activity. The BLA SOM cells do not necessarily have to be the only source of the high theta observed in the BLA during fear learning; the high theta detected in the LFP of the BLA also originates from the prefrontal cortex and/or the hippocampus (Stujenske et al., 2014, 2022).”

      (5) Regarding: "This suggests low theta power change is not just an epiphenomenon but rather a biomarker of successful fear conditioning." Not sure this is the right framing for the above statement. The power of the theta signal in the LFP reflects the strengthening of connections, but it itself does not have an impact on network activity. Moreover, whether something is epiphenomenal is not relevant to the question of whether it can serve as a successful biomarker. A biomarker just needs to be indicative, not causal. 

      We intended to say why the low theta power change is a biomarker in the sense of the Reviewer. That is: experiments have shown that, with learning, the low theta power increases. The modeling shows in addition that, when learning does not take place, the low power does not increase. That means that the low theta power increases if and only if there is learning, i.e., the change in low theta power is a biomarker. To make our meaning clearer, we have changed the quoted sentences to read: 

      “This suggests that the low theta power change is a biomarker of successful fear conditioning: it occurs when there is learning and does not occur when there is no learning.”

      Reviewer #2 (Public Comments): 

      We thank the Reviewer for raising these interesting points. Below are our public replies and the changes we made to the manuscript to address the Reviewer’s objections.

      (1) Gamma oscillations are generated locally; thus, it is appropriate to model in any cortical structure. However, the generation of theta rhythms is based on the interplay of many brain areas therefore local circuits may not be sufficient to model these oscillations.

      Moreover, to generate the classical theta, a laminal structure arrangement is needed (where neurons form layers like in the hippocampus and cortex)(Buzsaki, 2002), which is clearly not present in the BLA. To date, I am not aware of any study which has demonstrated that theta is generated in the BLA. All studies that recorded theta in the BLA performed the recordings referenced to a ground electrode far away from the BLA, an approach that can easily pick up volume conducted theta rhythm generated e.g., in the hippocampus or other layered cortical structure. To clarify whether theta rhythm can be generated locally, one should have conducted recordings referenced to a local channel (see Lalla et al., 2017 eNeuro). In summary, at present, there is no evidence that theta can be generated locally within the BLA. Though, there can be BLA neurons, firing of which shows theta rhythmicity, e.g., driven by hippocampal afferents at theta rhythm, this does not mean that theta rhythm per se can be generated within the BLA as the structure of the BLA does not support generation of rhythmic current dipoles. This questions the rationale of using theta as a proxy for BLA network function which does not necessarily reflect the population activity of local principal neurons in contrast to that seen in the hippocampus.

      In both modeling and experiments, a laminar structure does not seem to be needed to produce a theta rhythm. A recent experimental paper, (Antonoudiou et al. 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. The authors draw this conclusion by looking at mice ex vivo slices. The currents that generate these rhythms are in the BLA, since the hippocampus was removed to eliminate hippocampal volume conduction and other nearby brain structures did not display any oscillatory activity. Also, in the modeling literature, there are multiple examples of the production of theta rhythms in small networks not involving layers; these papers explain the mechanisms producing theta from non-laminated structures (Dudman et al., 2009, Kispersky et al., 2010, Chartove et al. 2020).  We are not aware of any model description of the mechanisms of theta that do require layers.

      We added the following text in the introduction of the manuscript to make this point clearer:  “A recent rodent experimental study (Antonoudiou et al. 2022) suggests that BLA can intrinsically generate theta oscillations (3-12 Hz).”

      (2) The authors distinguished low and high theta. This may be misleading, as the low theta they refer to is basically a respiratory-driven rhythm typically present during an attentive state (Karalis and Sirota, 2022; Bagur et al., 2021, etc.). Thus, it would be more appropriate to use breathing-driven oscillations instead of low theta. Again, this rhythm is not generated by the BLA circuits, but by volume conducted into this region. Yet, the firing of BLA neurons can still be entrained by this oscillation. I think it is important to emphasize the difference.

      Many rhythms of the nervous system can be generated in multiple parts of the brain by multiple mechanisms. We do not dispute that low theta appears in the context of respiration; however, this does not mean that other rhythms with the same frequencies are driven by respiration. Indeed, in the response to question 1 above, we showed that theta can appear in the BLA without inputs from other regions. In our paper, the low theta is generated in the BLA by VIP neurons. Using intrinsic currents known to exist in VIP neurons (Porter et al., 1998), modeling has shown that such neurons can intrinsically produce a low theta rhythm. This is also shown in the current paper. This example is part of a substantial literature showing that there are multiple mechanisms for any given frequency band. 

      To elaborate more on this in the manuscript, we added the following new section in the discussion:

      “Where the rhythms originate, and by what mechanisms. A recent experimental paper, (Antonoudiou et al. 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone. They draw this conclusion in mice by removing the hippocampus, which can volume conduct to BLA, and noticing that other nearby brain structures did not display any oscillatory activity. Our model also supports the idea that intrinsic mechanisms in the BLA can support the generation of the low theta, high theta, and gamma rhythms. 

      Although the BLA can produce these rhythms, this does not rule out that other brain structures also produce the same rhythms through different mechanisms, and these can be transmitted to the BLA. Specifically, it is known that the olfactory bulb produces and transmits the respiratory-related low theta (4 Hz) oscillations to the dorsomedial prefrontal cortex, where it organizes neural activity (Bagur et al., 2021). Thus, the respiratory-related low theta may be captured by BLA LFP because of volume conduction or through BLA extensive communications with the prefrontal cortex. Furthermore, high theta oscillations are known to be produced by the hippocampus during various brain functions and behavioral states, including during spatial exploration (Vanderwolf, 1969) and memory formation/retrieval (Raghavachari et al., 2001), which are both involved in fear conditioning. Similarly to the low theta rhythm, the hippocampal high theta can manifest in the BLA. It remains to understand how these other rhythms may interact with the ones described in our paper.”

      We also note that the presence of D-currents in the BLA VIP interneurons should be confirmed experimentally, and that the ability of VIP interneurons to generate the BLA low theta rhythm constitutes a prediction of our computational model. These points are specified in the first paragraph in the Discussion entitled “Assumptions and predictions of the model”:

      “The interneuron descriptions in the model were constrained by the electrophysiological properties reported in response to hyperpolarizing currents (Sosulina et al., 2010). Specifically, we modeled the three subtypes of VIP, SOM, and PV interneurons displaying bursting behavior, regular spiking with early spike-frequency adaptation, and regular spiking without spike-frequency adaptation, respectively. Focusing on VIP interneurons, we were able to model the bursting behavior by including the D-type potassium current. This current is thought to exist in the VIP interneurons in the cortex (Porter et al., 1998), but whether this current is also found in the VIP interneurons the BLA is still unknown. Similarly, we endowed the SOM interneurons with NaP- and H-currents, as the OLM cells in the hippocampus. Due to these currents, the VIP and SOM cells are able to show  low- and high-theta oscillations, respectively. The presence of these currents and the neurons’ ability to exhibit oscillations in the theta range during fear conditioning and at baseline in BLA, which are assumptions of our model, should be tested experimentally.”

      (3) The authors implemented three interneuron types in their model, ignoring a large fraction of GABAergic cells present in the BLA (Vereczki et al., 2021). Recently, the microcircuit organization of the BLA has been more thoroughly uncovered, including connectivity details for PV+ interneurons, firing features of neurochemically identified interneurons (instead of mRNA expression-based identification, Sosulina et al., 2010), synaptic properties between distinct interneuron types as well as principal cells and interneurons using paired recordings. These recent findings would be vital to incorporate into the model instead of using results obtained in the hippocampus and neocortex. I am not sure that a realistic model can be achieved by excluding many interneuron types.

      The interneurons and connectivity that we used were inspired by the functional connectivity reported in (Krabbe et al., 2019) (see above answer to Reviewer #1). As reported in (Vereczki et al., 2021), there are multiple categories and subcategories of interneurons; that paper does not report on which ones are essential for fear conditioning. We did use all the highly represented categories of the interneurons, except NPYcontaining neurogliaform cells.

      The Reviewer says “I am not sure that a realistic model can be achieved by excluding many interneuron types”. We agree with the Reviewer that discarding the introduction of other interneurons subtypes and the description of more specific connectivity (soma-, dendrite-, and axon-targeting connections) may limit the ability of our model to describe all the details in the BLA. However, this work represents a first effort towards a biophysically detailed description of the BLA rhythms and their function. As in any modeling approach, assumptions about what to describe and test are determined by the scientific question; details postulated to be less relevant are omitted to obtain clarity. The interneuron subtypes we modeled, especially VIP+ and PV+, have been reported to have a crucial role in fear conditioning (Krabbe et al., 2019). Other interneurons, e.g. cholecystokinin and SOM+, have been suggested as essential in fear extinction. Thus, in the follow-up of this work to explain fear extinction, we will introduce other cell types and connectivity. In the current work, we have achieved our goals of explaining the origin of the experimentally found rhythms and their roles in the production of plasticity underlying fear learning. Of course, a more detailed model may reveal flaws in this explanation, but this is science that has not yet been done.

      We elaborate more on this in a new section in the Discussion entitled “Assumptions and predictions of the model”. The paragraph related to this point reads as follows:

      “Our model, which is a first effort towards a biophysically detailed description of the BLA rhythms and their functions, does not include the neuron morphology, many other cell types, conductances, and connections that are known to exist in the BLA; models such as ours are often called “minimal models” and constitute the majority of biologically detailed models. Such minimal models are used to maximize the insight that can be gained by omitting details whose influence on the answers to the questions addressed in the model are believed not to be qualitatively important. We note that the absence of these omitted features constitutes hypotheses of the model: we hypothesize that the absence of these features does not materially affect the conclusions of the model about the questions we are investigating. Of course, such hypotheses can be refuted by further work showing the importance of some omitted features for these questions and may be critical for other questions. Our results hold when there is some degree of heterogeneity of cells of the same type, showing that homogeneity is not a necessary condition.”

      (4) The authors set the reversal potential of GABA-A receptor-mediated currents to -80 mV. What was the rationale for choosing this value? The reversal potential of IPSCs has been found to be -54 mV in fast-spiking (i.e., parvalbumin) interneurons and around -72 mV in principal cells (Martina et al., 2001, Veres et al., 2017).

      A GABA-A reversal potential around -80 mV is common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020). Other computational works of the amygdala, e.g. (Kim et al., 2016), consider GABA-A reversal potential at -75 mV based on the cortex (Durstewitz et al., 2000). The papers cited by the reviewer have a GABA-A reversal potential of -72 mV for synapses onto pyramidal cells; this is sufficiently close to our model that it is not likely to make a difference. For synapses onto PV+ cells, the papers cited by the reviewer suggest that the GABA-A reversal potential is -54 mV; such a reversal potential would lead these synapses to be excitatory instead of inhibitory. However, it is known (Krabbe et al., 2019; Supp. Fig. 4b) that such synapses are in fact inhibitory. Thus, we wonder if the measurements of Martina and Veres were made in a condition very different from that of Krabbe. For all these reasons, we consider a GABA-A reversal potential around -80 mV in amygdala to be a reasonable assumption.

      In section “Network connectivity and synaptic currents” in “Materials and Methods” we provided references to motivate our choice of considering a GABA-A reversal potential around -80 mV:

      “The GABAa current reversal potential (𝐸!) is set to −80        𝑚𝑉, as common in the modeling literature (Jensen et al., 2005; Traub et al., 2005; Kumar et al., 2011; Chartove et al., 2020).”

      (5) Proposing neuropeptide VIP as a key factor for learning is interesting. Though, it is not clear why this peptide is more important in fear learning in comparison to SST and CCK, which are also abundant in the BLA and can effectively regulate the circuit operation in cortical areas.

      Other peptides seem to be important in overall modulation of fear, but VIP is especially important in the first part of fear learning, the subject of our paper. Re SST: we hypothesize that SST interneurons are critical in fear extinction and preventing fear generalization, but not to initial fear learning. The peptide of the CCK neurons, which overlap with VIP cells, has been proposed to promote the switch between fear and safety states after fear extinction (Krabbe al. 2018). Thus, these other peptides are likely more important for other aspects of fear learning.  

      In the Discussion, we have added:

      “We hypothesize that SST peptide is critical in fear extinction and preventing fear generalization, but not to initial fear learning. Also, the CCK peptide has been proposed to promote the switch between fear and safety states after fear extinction (Krabbe al. 2018).”

      Reviewer #2 (Recommendations For The Authors): 

      We note that Reviewer #2’s Recommendations For The Authors have the same content as the Public Comments. Thus, the changes to the manuscript we implemented above address also the private critiques listed below.

      (1) As the breathing-driven rhythm is a global phenomenon accompanying fear state, one might restrict the analysis to this oscillation. The rationale beyond this restriction is that the 'high' theta in the BLA has an unknown origin (since it can originate from the ventral hippocampus, piriform cortex etc.). 

      In response to point 4 made by Reviewer 1 (Recommendations for the Authors) (p. 13), referring to high theta in the BLA, we previously wrote: 1) having the SOM cells in the BLA is critical to the plasticity in the model, and 2) these cells may or may not be the source of the high theta observed in the BLA during fear learning.

      In the Public Critiques, Reviewer 2 relates the respiratory rhythm to the low theta. We answered this point in point 2 of the Reviewer’s Public Comments (at p. 15).

      (2) I would include more interneurons in the network model incorporating recent findings. 

      This point was answered in our response to point 3 of the Reviewer’s Public Comments.

      (3) The reversal potential for GABA-A receptor-mediated currents would be good to set to measured values. In addition, I would use AMPA conductance values that have been measured in the BLA. 

      We addressed this objection in our response to point 4 of the Reviewer’s Public Comments.

      Reviewer #3 (Public comments):

      Weaknesses: 

      (1) The main weakness of the approach is the lack of experimental data from the BLA to constrain the biophysical models. This forces the authors to use models based on other brain regions and leaves open the question of whether the model really faithfully represents the basolateral amygdala circuitry. 

      (2) Furthermore, the authors chose to use model neurons without a representation of the morphology. However, given that PV+ and SOM+ cells are known to preferentially target different parts of pyramidal cells and given that the model relies on a strong inhibition form SOM to silence pyramidal cells, the question arises whether SOM inhibition at the apical dendrite in a model representing pyramidal cell morphology would still be sufficient to provide enough inhibition to silence pyramidal firing.

      3) Lastly, the fear learning relies on the presentation of the unconditioned stimulus over a long period of time (40 seconds). The authors justify this long-lasting input as reflecting not only the stimulus itself but as a memory of the US that is present over this extended time period. However, the experimental evidence for this presented in the paper is only very weak.

      We are repeating here the answers we gave in response to the public comments, adding further relevant points.

      (1) Our neurons were constrained by electrophysiology properties in response to hyperpolarizing currents in the BLA (Sosulina et al., 2010). We can reproduce these electrophysiological properties by using specific membrane currents known to be present in similar neurons in other brain regions (D-current in VIP interneurons in the cortex, and NaP- and H-currents in OLM/SOM cells in the hippocampus). Also, though a much more detailed description of BLA interneurons was given in (Vereczki et al., 2021), it is not clear that this level of detail is relevant to the questions that we were asking, especially since the experiments described were not done in the context of fear learning.

      (2) It is true that we did not include the morphology, which undoubtedly makes a difference to some aspects of the circuit dynamics. Furthermore, it is correct that the model relies on a strong inhibition from SOM and PV to silence the excitatory projection neurons. We agree that the placement of the SOM inhibition on the pyramidal neurons can make a difference on some aspects of the circuit behavior. We are assuming that the inhibition from the SOM cells can inhibit the pyramidal cells firing, which can be seen as a hypothesis of our model. It is well known that VIP cells disinhibit pyramidal cells through inhibition of SOM and PV cells (Krabbe et al. 2019); hence, this hypothesis is generally believed. This choice of parameters comes from using simplified models: it is standard in modeling to adjust parameters to compensate for simplifications.

      Re points 1) and 2), in a new paragraph (“Assumptions and predictions of the model”) in the Discussion reported in response to Reviewer #2 (public comments)’s point 3, we stated that modeling requires the omission of many details to bring out the significance of other details.

      (3) 40 seconds is the temporal interval we decided to use to present the results. In the Results, we also showed that there is learning over a shorter interval of time (15 seconds) where CS and US/memory of US should both be present. Thus, our model requires 15 seconds over a single or multiple trials for associative learning to be established. We included references to additional experimental papers to support our reasoning in the last paragraph of section “Assumptions and predictions of the model” in the Discussion, also reported in response to Reviewer #1 point 2 (Recommendations for the Authors). We said there that some form of memory or overlap in the activity of the excitatory projection neurons is necessary for spike-timing-dependent plasticity.

      The authors achieved the aim of constructing a biophysically detailed model of the BLA not only capable of fear learning but also showing spectral signatures seen in vivo. The presented results support the conclusions with the exception of a potential alternative circuit mechanism demonstrating fear learning based on a classical Hebbian (i.e. non-depression-dominated) plasticity rule, which would not require the intricate interplay between the inhibitory interneurons. This alternative circuit is mentioned but a more detailed comparison between it and the proposed circuitry is warranted.

      Our model accounts for the multiple rhythms observed in the context of fear learning, as well as the known involvement of multiple kinds of interneurons. We did not say explicitly enough why our complicated model may be functionally important in ways that cannot be fulfilled with a simpler model with the non depression-dominated Hebbian rule. To explain this, we have added the following in the manuscript discussion: 

      “Although fear learning can occur without the depression-dominated rule, we hypothesize that it is necessary for other aspects of fear learning and regulation. That is, in pathological cases, there can be overgeneralization of learning. We hypothesize that the modulation created by the involvement of these interneurons is normally used to prevent such overgeneralization. However, this is beyond the scope of the present paper.”

      We have also written an extra paragraph about generalization in the Discussion “Synaptic plasticity in our model”:

      “With the classical Hebbian plasticity rule, we show that learning can occur without the involvement of the VIP and SOM cells. Although fear learning can occur without the depressiondominated rule, we hypothesize that the latter is necessary for other aspects of fear learning and regulation. Generalization of learning can be pathological, and we hypothesize that the modulation created by the involvement of VIP and SOM interneurons is normally used to prevent such overgeneralization. However, in some circumstances, it may be desirable to account for many possible threats, and then a classical Hebbian plasticity rule could be useful. We note that the involvement or not of the VIP-SOM circuit has been implicated when there are multiple strategies for solving a task (Piet et al., 2024). In our situation, the nature of the task (including reward structure) may determine whether the learning rule is depression-dominated and therefore whether the VIP-SOM circuit plays an important role.”

      Reviewer #3 (Recommendations For The Authors): 

      We thank the Reviewer for all the recommendations. We replied to each of them below.

      In general, there are some inconsistencies in the naming (e.g. sometimes you write PV sometimes PV+,...), please use consistent abbreviations throughout the manuscript. You also introduce some of the abbreviations multiple times. 

      We modified the manuscript to remove all the inconsistencies in the naming. 

      Introduction: 

      - In the last section you speak about one recent study but actually cite two articles. 

      We removed the reference to (Perrenoud and Cardin, 2023), which is a commentary on the Veit et al. article.

      Results: 

      - 'Brain rhythms are thought to be encoded and propagated largely by interneurons' What do you mean by encoded here? 

      We agree with the Reviewer that the verb “to encode” is not accurate. We modified the sentence as follows:

      “Brain rhythms are thought to be generated and propagated largely by interneurons”.

      - The section 'Interneurons interact to modulate fear neuron output' could be clearer. Start with describing the elements of the circuit, then the rhythms in the baseline. 

      We reorganized the section as follows:

      “Interneurons interact to modulate fear neuron output. Our BLA network consists of interneurons, detailed in the previous section, and excitatory projection neurons (Fig. 2A). Both the fear-encoding neuron (F), an excitatory projection neuron, and the VIP interneuron are activated by the noxious stimulus US (Krabbe et al., 2019). As shown in Fig. 2A (top, right), VIP disinhibits F by inhibiting both SOM and PV, as suggested in (Krabbe et al., 2019). We do not include connections from PV to SOM and VIP, nor connections from SOM to PV and VIP, since those connections have been shown to be significantly weaker than the ones included (Krabbe et al., 2019). The simplest network we consider is made of one neuron for each cell type. We introduce a larger network with some heterogeneity in the last two sections of the Results.

      Fig. 2A (bottom) shows a typical dynamic of the network before and after the US input onset, with US modeled as a Poisson spike train at ~50 Hz; the network produces all the rhythms originating from the interneurons alone or through their interactions with the excitatory projection neurons (shown in Fig. 1). Specifically, since VIP is active at low theta during both rest and upon the injection of US, it then modulates F at low theta cycles via SOM and PV. In the baseline condition, the VIP interneuron has short gamma bursts nested in low theta rhythm. With US onset, VIP increases its burst duration and the frequency of low theta rhythm. These longer bursts make the SOM cell silent for long periods of each low theta cycle, providing F with windows of disinhibition and contributing to the abrupt increase in activity right after the US onset. Finally, in Fig. 2A, PV lacks any external input and fires only when excited by F. Thanks to their reciprocal interactions, PV forms a PING rhythm with F, as depicted in Fig.1C.”

      - Figure 3C: The lower dashed line has the tick label '0.37' which should read '0.037'. 

      We fixed it.

      - The section describing the network with multiple neurons could be clearer, especially, it is not really clear how these different ECS and F neurons receive their input. 

      We answered the same objection in the reply to Reviewer #1 in point 2 under “minor issues.”

      Discussion: 

      - The paragraph 'It has also been suggested that ventral tegmental area has a role in fear expression (Lesas et al.,2023). Furthermore, it has been reported that the prelimbic cortex (PL) modulates the BLA SOM cells during fear retrieval, and the latter cells are crucial to discriminate non-threatening cues when desynchronized by the PL inputs (Stujenske et al., 2022).' is merely stating facts but I don't see how they relate to the presented work. 

      We thank the Reviewer for pointing out that this was confusing. What we meant to emphasize was that later stages of fear conditioning and extinction appear to require more than the BLA. We specifically mention the discrimination of non-threatening cues at the end of the paragraph, which now reads as follows:

      “Other brain structures may be involved in later stages of fear responsiveness, such as fear extinction and prevention of generalization. It has been reported that the prelimbic cortex (PL) modulates the BLA SOM cells during fear retrieval, and the latter cells are crucial to discriminate non-threatening cues when desynchronized by the PL inputs (Stujenske et al., 2022). Brain structures such as the prefrontal cortex and hippocampus have been documented to play a crucial role also in fear extinction, the paradigm following fear conditioning aimed at decrementing the conditioned fearful response through repeated presentations of the CS alone. As reported by several studies, fear extinction suppresses the fear memory through the acquisition of a distinct memory, instead of through the erasure of the fear memory itself (Harris et al., 2000; Bouton, 2002; Trouche et al., 2013; Thompson et al., 2018). Davis et al., 2017 found a high theta rhythm following fear extinction that was associated with the suppression of threat in rodents. Our model can be extended to include structures in the prefrontal cortex and the hippocampus to further investigate the role of rhythms in the context of discrimination of non-threatening cues and extinction. We hypothesize that a different population of PV interneurons plays a crucial role in mediating competition between fearful memories, associated with a low theta rhythm, and safety memories, associated with a high theta rhythm; supporting experimental evidence is in (Lucas et al., 2016; Davis et al., 2017; Chen et al., 2022).”

      - The comparison to other models BLA is quite short and seems a bit superficial. A more indepth comparison seems warranted. 

      We thank the reviewer for suggesting that a more in-depth comparison between our and other models in the literature would improve the manuscript. We rewrote entirely the first paragraph of that section. The new content reads as follows:

      “Comparison with other models. Many computational models that study fear conditioning have been proposed in the last years; the list includes biophysically detailed models (e.g., (Li 2009; Kim et al., 2013a)), firing rate models (e.g., Krasne 2011; Ball 2012; Vlachos 2011), and connectionist models (e.g., Moustafa 2013; Armony 1997; Edeline 1992) (for a review see (Nair et al., 2016)). Both firing rate models and connectionist models use an abstract description of the interacting neurons or regions. The omission of biophysical details prevents such models from addressing questions concerning the roles of dynamics and biophysical details in fear conditioning, which is the aim of our model.  There are also biophysically detailed models (Li 2009; Kim 2013; Kim 2016; Feng 2019), which differ from ours in both the physiology included in the model and the description of how plastic changes take place.  One main difference in the physiology is that we differentiated among types of interneurons, since the fine timing produced for the latter was key to our use of rhythms to produce spike-time dependent plasticity. The origin of the gamma rhythm (but not the other rhythms) was investigated in Feng et al 2019, but none of these papers connected the rhythms to plasticity.

      The most interesting difference between our work and that in (Li 2009; Kim 2013; Kim 2016) is the modeling of plasticity.  We use spike-time dependent plasticity rules.  The models in (Li 2009; Kim 2013; Kim 2016) were more mechanistic about how the plasticity takes place, starting with the known involvement of calcium with plasticity.  Using a hypothesis about back propagation of spikes, the set of papers together come up with a theory that is consistent with STDP and other instantiations of plasticity (Shouval 2002a; Shouval 2002b).  For the purposes of our paper, this level of detail, though very interesting, was not necessary for our conclusions.  By contrast, in order for the rhythms and the interneurons to have the dynamic roles they play in the model, we needed to restrict our STDP rule to ones that are depression-dominated.  Our reading of (Shouval 2002) suggests to us that such subrules are possible outcomes of the general theory.  Thus, there is no contradiction between the models, just a difference in focus; our focus was on the importance of the much-documented rhythms (Seidenbecher et al., 2003; Courtin et al., 2014b; Stujenske et al., 2014; Davis et al., 2017) in providing the correct spike timing.  We showed in the Supplementary Information (“Classical Hebbian plasticity rule, unlike the depression-dominated one, shows potentiation even with no strict pre and postsynaptic spike timing”) that if the STDP rule was not depression dominated, the rhythms need not be necessary.  We hypothesize that the necessity of strict timing enforced by the depression-dominated rule may foster the most appropriate association with fear at the expense of less relevant associations.”

      - The paragraph 'This could happen among some cells responding to weaker sensory inputs that do not lead to pre-post timing with fear neurons. This timing could be modified by the "triconditional rule", as suggested in (Grewe et al., 2017).' is not very clear. What exactly is 'this' in the first sentence referring to? If you mention the 'tri-conditional rule' here, please briefly explain it and how it would solve the issue at hand here.  

      We apologize that the sentence reported was not sufficiently clear. “This” refers to “depression”. We meant that, in our model, depression during fear conditioning happens every time there is no pre-post timing between neurons encoding the neutral stimuli and fear cells; poor pre-post timing can characterize the activity of neurons responding to weaker sensory inputs and does not lead to associative learning. We modified that paragraph as follows:

      “The study in (Grewe et al., 2017) suggests that associative learning resulting from fear conditioning induces both potentiation and depression among coactive excitatory neurons; coactivity was determined by calcium signaling and thus did not allow measurements of fine timing between spikes. In our model, we show how potentiation between coactive cells occurs when strict pre-post spike timing and appropriate pauses in the spiking activity arise. Depression happens when one or both of these components are not present. Thus, in our model, depression represents the absence of successful fear association and does not take part in the reshaping of the ensemble encoding the association, as instead suggested in (Grewe et al., 2017). A possible follow-up of our work involves investigating how fear ensembles form and modify through fear conditioning and later stages. This follow-up work may involve using a tri-conditional rule, as suggested in (Grewe et al. 2017), in which the potential role of neuromodulators is taken into account in addition to the pre- and postsynaptic neuron activity; this may lead to both potentiation and depression in establishing an associative memory.”

      - In the limitations and caveats section you mention that the small size of the network implies that they represent a synchronous population. What are the potential implications for the proposed rhythm-dependent mechanism? What are your expectations for larger networks? 

      We apologize if we were not adequately clear. We are guessing that the Reviewer thought we meant the entire population was synchronous, which it is not. We meant that, when we use a single cell to represent a subpopulation of cells of that type, that subpopulation is effectively synchronous. For larger networks in which each subtype is represented by many cells, there can be heterogeneity within each subtype. We have shown in the paper that the basic results still hold under some heterogeneity; however, they may fail if the heterogeneity is too large.

      We mentioned in a new section named “Assumptions and predictions of the model” in response to point 3 made by Reviewer #2.

      - The discussion is also missing a section on predictions/new experiments that can be derived from the model. How can the model be confirmed, what experiments/results would break the model? 

      To answer this question, we put in a new section in the Discussion entitled “Assumptions and predictions of the model”. The first paragraph of this section is in the reply to Reviewer #2 point 2; the second paragraph is in the reply to Reviewer #2 point 3; the last paragraph is in the Reply to Reviewer #1 point c; the rest of the section reads as follows:

      “Our study suggests that all the interneurons are necessary for associative learning provided that the STDP rule is depression-dominated. This prediction could be tested experimentally by selectively silencing each interneuron subtype in the BLA: if the associative learning is hampered by silencing any of the interneuron subtypes, this validates our study. Finally, the model prediction could be tested indirectly by acquiring more information about the plasticity rule involved in the BLA during associative learning. We found that all the interneurons are necessary to establish fear learning only in the case of a depression-dominated rule. This rule ensures that fine timing and pauses are always required for potentiation: interneurons provide both fine timing and pauses to pyramidal cells, making them crucial components of the fear circuit. 

      The modeling of the interneurons assumes the involvement of various intrinsic currents; the inclusion of those currents can be considered hypotheses of the model. Our model predicts that blockade of D-current in VIP interneurons (or silencing VIP interneurons) will both diminish low theta and prevent fear learning. Finally, the model assumes the absence of significantly strong connections from the excitatory projection cells ECS to PV interneurons, unlike the ones from F to PV. Including those synapses would alter the PING rhythm created by the interactions between F and PV, which is crucial for fine timing between ECS and F needed for LTP.”

    1. We would like to thank you and the reviewers for your thoughtful comments that assisted us to improve the manuscript. We carefully followed the reviewers’ recommendations and provide a detailed point-by-point account of our responses to the comments. 

      Please find below the important changes in the updated manuscript.

      (1) We changed the title according to the comments provided by reviewer #1.

      (2) We edited the introduction, results, and discussion to improve the link between the objectives of the study, the findings, and their discussion, as reviewer #2 recommended.

      (3) We clarified the link between camouflage and fitness, which is now presented as a hypothesis, as reviewer #1 suggested.

      (4) We added new analyses and figures in the main text and in the supplementary materials to better emphasize sex differences in landing force, foraging strategies and hunting success, following reviewer #1 suggestion.

      (5) According to reviewer #2 comments, we edited the results adding key information about methods to help the reader understand the findings without reading the Methods section.

      (6) We added important details about the model selection approach along with a discussion of the low R-square values reported in our analyses on hunting success, as reviewer #2 suggested.

      eLife assessment 

      This fundamental work substantially advances our understanding of animals' foraging behaviour, by monitoring the movement and body posture of barn owls in high resolution, in addition to assessing their foraging success. With a large dataset, the evidence supporting the main conclusions is convincing. This work provides new evidence for motion-induced sound camouflage and has broad implications for understanding predator-prey interactions. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this paper, Schalcher et al. examined how barn owls' landing force affects their hunting success during two hunting strategies: strike hunting and sit-and-wait hunting. They tracked tens of barn owls that raised their nestlings in nest boxes and utilized high-resolution GPS and acceleration loggers to monitor their movements. In addition, camcorders were placed near their nest boxes and used to record the prey they brought to the nest, thus measuring their foraging success. 

      This study generated a unique dataset and provided new insights into the foraging behavior of barn owls. The researchers discovered that the landing force during hunting strikes was significantly higher compared to the sit-and-wait strategy. Additionally, they found a positive relationship between landing force and foraging success during hunting strikes, whereas, during the sit-and-wait strategy, there was a negative relationship between the two. This suggests that barn owls avoid detection by generating a lower landing force and producing less noise. Furthermore, the researchers observed that environmental characteristics affect barn owls' landing force during sit-and-wait hunting. They found a greater landing force when landing on buildings, a lower landing force when landing on trees, and the lowest landing force when landing on poles. The landing force also decreased as the time to the next hunting attempt decreased. These findings collectively suggest that barn owls reduce their landing force as an acoustic camouflage to avoid detection by their prey. 

      The main strength of this work is the researchers' comprehensive approach, examining different aspects of foraging behavior, including high-resolution movement, foraging success, and the influence of the environment on this behavior, supported by impressive data collection. The weakness of this study is that the results only present a partial biological story contained within the data. The focus is on acoustic camouflage without addressing other aspects of barn owls' foraging strategy, leaving the reader with many unanswered questions. These include individual differences, direct measurements of owls' fitness, a detailed analysis of the foraging strategy of males and females, and the collective effort per nest box. However, it is possible that these data will be published in a separate paper. 

      We greatly appreciate your recognition of the comprehensive approach and extensive data collection. Our primary objective was to study the role of acoustic camouflage. Nonetheless, the manuscript now includes a detailed analysis of the foraging strategy and hunting success of males and females (lines 164-225).

      The results presented support the authors' conclusion that lower landing force during sit-andwait hunting increases hunting success, likely due to a decreased probability of detection by their prey, resulting in acoustic camouflage. The authors also argue that hunting success is crucial for survival, and thus, acoustic camouflage has a direct link to fitness. While this statement is reasonable, it should be presented as a hypothesis, as no direct evidence has been provided here.

      Thank you for the comment. We agree and thus have edited the language accordingly.  

      However, since information about nestling survival is typically monitored when studying behavior during the breeding period, the authors' knowledge of the effect of acoustic camouflage on owls' fitness can probably be provided. Furthermore, it will be interesting to further examine the foraging strategies used by different individuals during foraging, the joint foraging success of both males and females within each nest box, and the link between landing force and foraging success if the data are available.

      We are currently writing a manuscript on these topics. We are aware that several scientific questions regarding the foraging ecology of the barn owl still need our attention. Regarding the link between landing force and foraging success, we believe that our revised manuscript addresses this specific topic, please see specific responses below.

      However, even without this additional analysis on survival, this paper provides an unprecedented dataset and the first measurement of landing force during hunting in the wild. It is likely to inspire many other researchers currently studying animal foraging behavior to explore how animals' movements affect foraging success.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors provide new evidence for motion-induced sound camouflage and can link the hunting approach to hunting success (detailing the adaptation and inferring a fitness consequence). 

      Strengths: 

      Strong evidence by combining high-resolution accelerometer data with a ground-truthed data set on prey provisioning at nest boxes. A good set of co-variates to control for some of the noise in the data provides some additional insights into owl hunting attempts. 

      Weaknesses: 

      There is a disconnect between the hypotheses tested and the results presented, and insufficient detail is provided on the statistical approach. R2 values of the presented models are very small compared to the significance of the effect presented. Without more detail, it is impossible to assess the strength of the evidence.

      In the revised manuscript, we changed the way results are presented and we improved the link between the hypotheses and the results. The R2 values are indeed small. It is however important to keep in mind that we are assessing the outcome of one specific behavior (i.e. landing force during sit-and-wait hunts) on hunting success in a wild environment, where many complex ecological interactions likely influence hunting success. Nonetheless, the coefficients (as reported in the results) show that for every 1 N increase in landing force, there is a 15% reduction in hunting success, which is substantial. In the discussion we also note that 50 Hz is a relatively low sampling frequency for estimating the peak ground reaction force. We have gone back over the presentation of our results and made our discussion more nuanced to acknowledge this aspect. 

      We have also added a detailed description about our model selection process in the methods section and provide a model selection table for each analysis in the supplementary materials.

      The authors seem to overcome persisting challenges associated with the validation and calibration of accelerometer data by ground-truthing on-board measures with direct observations in captivity, but here the methods are not described any further and sample sizes (2 owls - how many different loggers were deployed?) might be too small to achieve robust behavioural classifications.

      Thank you for the comment. Details of our methods of behavioural identification are provided in lines 385 – 429. There are two reasons why our results should not be limited by the sample size. First, we used the temporal sequence of changes in acceleration, and rates of change in acceleration data, which make the methods robust to individual differences in acceleration values. Furthermore, our methods for behavioural identification were not based on machine learning. Instead, we use a Boolean based approach (as described in Wilson et al. 2018. MEE), which is more robust to small differences in absolute values that might occur e.g. in relation to slight changes in device position. 

      Recommendation for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Comment 1. This study provides new insights into animals' foraging behavior and will probably inspire other researchers to examine foraging behavior in such high resolution.

      We hope so, thank you.

      Comment 2. However, it is necessary to describe better the measured landing force and the hunting strike and perching behavior so the readers can understand these methods when reading the results (and without reading the Methods).

      We have now changed the text in the “Results” to help the reader understand the key methods while reading the results.

      Comment 3. In addition, make sure you use the same terminology for hunting strategies during the entire paper and especially in all figures and corresponding result descriptions.

      We now use consistent terminology throughout the text and figures. We hope that this is now clear in the revised manuscript.

      Comment 4. In addition, although I find your statement about the link between acoustic camouflage and fitness reasonable, it should be described as a hypothesis or examined if you want to keep the direct link statement. I believe showing a direct link can add an additional outstanding aspect to this paper, but I also understand that it can be addressed in a separate paper.

      We agree that the relationship between hunting success and barn owl fitness is an important topic, but it necessitates a consideration of both hunting strategies, including hunting on the wing, which extends beyond the limits of our current study. Indeed, our primary objective was to conduct a detailed examination of the interplay between acoustic camouflage and the success of the sit-and-wait technique.

      However, we have edited the manuscript to explicitly describe the link between acoustic camouflage and fitness as a hypothesis. We believe this adjustment provides a more accurate representation of our approach. We hope this clarifies the specific emphasis of our work and its contribution to the understanding of barn owl hunting behavior.

      Here are my detailed comments about the paper: 

      Comment 5. Title: Consider changing the title to "Acoustic camouflage predicts hunting success in a wild predator." 

      We would like to thank you for your nice proposition. However, we opted for a different title, which is now “Landing force reveals new form of motion-induced sound camouflage in a wild predator”.

      Comment 6. Line 91-93: Please provide additional information about the collected dataset, including: 

      Description of the total period of observations, an average and standard deviation of perching and hunting attempt events per individual per night, number of foraging trips per individual per night, details about the geographic location and characteristics of the habitat, season, and reproductive state. 

      The revised manuscript now includes detailed information about the collected dataset (i.e. study area, reproductive state, etc…). “We used GPS loggers and accelerometers to record high resolution movement data during two consecutive breeding seasons (May to August in 2019 and 2020) from 163 wild barn owls (79 males and 84 females) breeding in nest boxes across a 1,000 km² intensive agricultural landscape in the western Swiss plateau.” Results section, lines 79 – 82

      Details about the number of foraging trips per individuals and per night are now presented in the results: “Sexual dimorphism in body mass was marked among our sampled individuals. Males were lighter than females (84 females, average body mass: 322 ± 22.6 g; 79 males, average body mass 281 ± 16.5 g, Fig S6) and provided almost three times more prey per night than females (males: 8 ± 5 prey per night; females: 3 ± 3 prey per night; Fig.S7). Males also displayed higher nightly hunting effort than females (Males: 46 ± 16 hunting attempts per night, n= 79; Females: 25 ± 11 hunting attempts per nights, n=84; Fig. 3A, Fig S8). However, females were more likely to use a sit and wait strategy than males (females: 24% ± 15%, males: 13% ± 10%, Fig.S9). As a result, the number of perching events per night was similar between males and females (Females: 76 ± 23 perching events per nights; Males: 69 ± 20 perching events per night; Fig S8).” (lines 165 – 174) 

      Comment 7. In addition, state if the information describes breeding pairs of males and females and provides statistics on the number of tracked pairs and the number of nest boxes.

      The revised manuscript now includes a description of the number of tracked breeding pairs and the number of nest boxes. “Of these individuals, 142 belonged to pairs for which data were recovered from both partners (71 pairs in total, 40 in 2019, 31 in 2020). The remaining 21 individuals belonged to pairs with data from one partner (11 females and 1 male in 2019; 4 females and 5 males in 2020).” (lines 82 – 85.)

      Comment 8. Line 93: Briefly define the term "landing force" and explain how it was measured (and let the reader know that there is a detailed description in the Methods).

      We now include a brief definition of the “landing force” along with a brief explanation of how it was measured in the results section. “We extracted the peak vectoral sum of the raw acceleration during each landing and converted this to ground reaction force (hereafter “landing force”, in Newtons) using measurements of individual body mass (see methods for detailed description).” (lines 92 – 95).

      Comment 9. Line 94: All definitions, including "pre-hunting force," need to be better described in the Results section.

      Thank you for this suggestion. We now provided a better description of those key definitions directly in the results section: 

      Measurement of landing force: “Barn owls employing a sit-and-wait strategy land on multiple perches before initiating an attack, with successive landings reducing the distance to the target prey (Fig. 2C). 

      We used the acceleration data to identify 84,855 landings. These were further categorized into perching events (n = 56,874) and hunting strikes (n = 27,981), depending whether barn owls were landing on a perch or attempting to strike prey on the ground (Fig. 1A and B, see methods for specific details on behavioral classification).” (lines 88 – 95)

      Pre-hunt perching force predicts hunting success: “Finally, we analyzed whether the landing force in the last perching event before each hunting attempt (i.e. pre-hunt perching force) predicted variation in hunting success” (lines 229 – 230)

      Comment 10. Line 102: Remove "Our analysis of 27,981 hunting strikes showed that" and add "n = 27,981" after the statistics. You have already stated your sample size earlier. There is no need to emphasize it again, although your sample size is impressive.

      We modified the text in the results section as suggested.

      Comment 11. Line 104: The results so far suggest that the difference in landing force between males and females is an outcome of their different body masses. However, it is not clear what is the reason for the difference in the number of hunting strike attempts between males and females (Lines 104-106). Can you compare the difference in landing force between males and females with similar body mass (females from the lower part of the distribution and males from the upper part)? Is there still a difference?

      Thank you, following your comment we made some new analyses that clarified the situation around landing force involved in perching and hunting strike events between sexes. But firstly, we wanted to clarify why there is a difference in number of hunting attempts between males and females. During the breeding season, females typically perform most of the incubation, brooding, and feeding of nestlings in the nest, while the male primarily hunts food for the female and chicks. The female supports the male providing food in a very irregular way, and this changes from pair to pair (paper in prep.). The differences in number of hunting attempts between males and females reflects this asymmetry in food provisioning between sexes during this specific period. We specified this in the revised version of the manuscript (lines 164 – 174). 

      We also provide a new analysis to investigate sex differences in mass-specific landing force (force/body mass). We found that males and females produce similar force per unit of body mass during perching events. This demonstrates that the overall higher perching force in females (see Fig. 4C in the manuscript) is therefore driven by their higher body mass. (lines 194 – 199)

      Comment 12. Line 154: I believe Boonman et al. (2018) is relevant to this part of the discussion. Boonman, Arjan, et al. found that barn owl noise during landing and taking off is worth considering. ["The sounds of silence: barn owl noise in landing and taking off."

      Behavioral Processes 157 (2018): 484-488.]

      We now cited this paper in the discussion.

      Comment 13. Line 164: Your results do not directly demonstrate a link to fitness, although they potentially serve as a proxy for fitness (add a reference). However, you might have information regarding nestlings' survival - that will provide a direct link for fitness. Change your statement or add the relevant data.

      We appreciated your feedback, and we adjusted the language accordingly.

      Comment 14. Line 213: If the poles are closer to the ground - is it possible that the higher trees and buildings serve for resting and gathering environmental information over greater distances? For example, identifying prey at farther distances or navigating to the next pole?

      Yes, this is indeed the most likely explanation for the fact that owls land more on buildings and trees than on poles until the last period (about 6 minutes) before hunting. In these last minutes, barn owls preferentially use poles, as we showed in figure 2B. The revised manuscript now includes this explanation in the discussion (lines 269 – 284).

      Comment 15. Line 250: The product "AXY-Trek loggers" does not appear on the Technosmart website (there are similar names, but not an exact match). Are you sure this is the correct name of the tracking device you used? 

      Thank you for pointing out this detail that we missed. The device we used is now called "AXY-Trek Mini" (https://www.technosmart.eu/axy-trek-mini/). We have corrected this error directly in the revised manuscript.

      Comment 16. Line 256: Please explain how the devices were recovered. Did you recapture the animals? If so, how? Additionally, replace "after approximately 15 days" with the exact average and standard deviation. Furthermore, since you have these data, please state the difference in body mass between the two measurements before and after tagging.

      The birds were recaptured to recover the devices. Adults barn owls were recaptured at their nest sites, again using automatic sliding traps that are activated when birds enter the nest box. The statement "after approximately 15 days" was replaced by the exact mean and standard deviation, which were 10.47 ± 2.27 days. Those numbers exclude five individuals from the total of 163 individuals included in this study. They could not be recaptured in the appropriate time window but were re-encountered when they initiated a second clutch later in the season (4 individuals) or a new clutch the year after (1 individual).

      We integrated this previously missing information in the revised manuscript (lines 370 – 372).

      Comment 17. Line 259: What was the resolution of the camera? What were the recording methods and schedule? How did you analyze these data? 

      The resolution was set to 3.1 megapixel. Motion sensitive camera traps were installed at the entrance to each nest box throughout the period when the barn owls were wearing data loggers, and each movement detected triggered the capture of three photos in bursts. The photos recorded were not analyzed as such for this study, but were used to confirm each supply of prey, which had previously been detected from the accelerometer data. We added these details in the revised manuscript (lines 377 – 380)

      Comment 18_1. Figure 1: 

      Panel A) Include the sex of the described individual. 

      The sex of the described individual is now included in the figure caption.

      Comment 18_2. It would be interesting to show these data for both males and females from the same nest box (choose another example if you don't have the data for this specific nest box). 

      Although we agree that showing tracks of males and females from the same nest is very interesting, the purpose of this figure was to illustrate our data annotation process and we believe that adding too many details on this figure will make it appear messy. However, the revised manuscript now includes a new figure (Fig. 3A) which shows simultaneous GPS tracks of a male and a female during a complete night, with detailed information about perching and hunting behaviors.

      Comment 18_3. Add the symbol of the nest box to the legend. 

      Done

      Comment 18_4. Provide information about the total time of the foraging trip in the text below. 

      The duration of the illustrated foraging trip has been included in the figure caption.

      Comment 18_5. To enhance the figure’s information on foraging behavior, consider color coding the trajectory based on time and adding a background representing the landscape. Since this paper may be of interest to researchers unfamiliar with barn owl foraging behavior, it could answer some common questions. 

      For similar reasons explained in our answer above (Comment 18_2), we would rather keep this figure as clean as possible. However, we followed your recommendations and included these details in the new Figure 3 described above. In this new figure, GPS tracks are color coded according to the foraging trip number and includes a background representing the landscape. To provide even more detail about the landscape, we added another figure in the supplementary materials (Fig. S2) which provides illustration of barn owls foraging ground and nest site that we think might be of interest for people unfamiliar with barn owls.

      Comment 18_6. Inset panels) provide a detailed description of the acceleration insert panels. 

      Done

      Comment 18_7. Color code the acceleration data with different colors for each axis, add x and y axes with labels, and ensure the time frame on the x-axis is clear. How was the self-feeding behavior verified (should be described in the methods section)? 

      We kept both inset panels as simple as possible since they serve here as examples, but a complete representation of these behaviors (with time frame, different colors and labels) is provided in the supplementary materials (figure S3). We included this statement in the figure caption and added a reference to the full representations from the supplementary materials: 

      In the Figure caption: “Inset panels show an example of the pattern of the tri-axial acceleration corresponding to both nest-box return and self-feeding behaviors (but see Fig S3for a detailed representation of the acceleration pattern corresponding to each behavior).” 

      In the Method section: “Self-feeding was evident from multiple and regular acceleration peaks in the surge and heave axes (resulting in peaks in VeDBA values > 0.2 g and < 0.9 g, Fig.S3D), with each peak corresponding to the movement of the head as the prey was swallowed whole.”.

      Comment 18_8. Panel B) Note in the caption that you refer to the acceleration z-axis.

      We believe that keeping the statement “the heave acceleration…” in the figure caption is more informative than referring to the “z-axis” as it describes the real dimension to which we are referring. The use of the x, y and z axes can be misleading as they can be interchanged depending on the type and setting of recorders used.

      Comment 18_9. Present the same time scale for both hunting strategies to facilitate comparison. You can achieve this by showing only part of the flight phase before perching. 

      Done

      Comment 18_10. Panel C) Presenting the data for both hunting strategy and sex would provide more comprehensive information about the results and would be relatively easy to implement. 

      We agree with your comment. We present the differences in landing force for both landing contexts and sexes in the new Figure 3 as well as in the supplementary materials (Figure S10) of this revised manuscript.

      Comment 19. Figure 2: Please provide an explanation of the meaning of the circles in the figure caption.  

      Done

      Comment 20. Figure 3: 

      Panel A) It is unclear how the owl illustration is relevant to this specific figure, unlike the previous figures where it is clear. Also, suggest removing the upper black line from the edge of the figure or add a line on the right side. 

      Done (now in Figure 2).

      Panel B) "Density" should be capitalized. 

      Done

      Panel C) Add a scale in meters, and it would be helpful to include an indication of time before hunting for each data point. 

      Done

      Comment 21. Figure S1: Mark the locations of the nest boxes and ensure that trajectories of different individuals and sexes can be identified. 

      The purpose of this figure was to show the spatial distribution of the data. We think that adding nest locations and coloring the paths according to individuals and/or sex will make the figure less clear. However, the new Figure 3 highlights those details.

      Comment 22. Figure S2: Show the pitch angle similarly to how you showed the acceleration axes, and explain what "VeDBA" stands for. Provide a description of the perching behavior, clearly indicating it on the figure. Add axes (x, y, z) to the illustration of the acceleration explanation. 

      We edited this figure (now figure S3) to show the pitch angle and provide an explanation of what “VeDBA” stands for in the figure caption. The figure caption now also provides a better description of the perching behavior. For the axes (i.e. X, Y, Z), we prefer to refer to the heave, surge, and sway as this is more informative and refers to what is usually reported in studies working with tri-axial accelerometers.

      Comment 23. Table S1: Improve the explanation in the caption and titles of the table. 

      Done

      Reviewer #2 (Recommendations For The Authors): 

      Comment 1. From the public review and my assessment there, the authors can be assured that I thoroughly enjoyed the read and am looking forward to seeing a revised and improved version of this paper. 

      We thank the reviewer for this comment. We revised the manuscript according to their comments.

      Comment 2. In addition to my major points stated above, I would like to add the following recommendations: 

      The manuscript is overall well written, but it uses a very pictorial language (a little as if we were in a David Attenborough documentary) that I find inappropriate for a research paper (especially in the abstract and introduction, "remarkable" (2x), "sophisticated" (are there any unsophisticated adaptations? We are referring to something under selection after all) etc.

      We appreciated that you found the paper overall well written, and we understand the comment about pictorial language. We therefore slightly changed the text to make sure that the adjective used to describe adaptive strategies are not over-emphasized.

      Comment 3. Abstract 

      "While the theoretical benefits of predator camouflage are well established, no study has yet been able to quantify its consequences for hunting success." - This claim is actually not fully true: 

      Nebel Carina, Sumasgutner Petra, Pajot Adrien and Amar Arjun 2019: Response time of an avian prey to a simulated hawk attack is slower in darker conditions, but is independent of hawk colour morph. Soc. open sci.6:190677 

      We edited our claim to specify that the consequences of predator camouflage on hunting success has never been quantified in natural conditions and cited the reference in the introduction.

      Comment 4. Line 23. Rephrase to: "We used high-resolution movement data to quantify how barn owls (Tyto alba) conceal their approach when using a sit-and-wait strategy, as well as the power exerted during strikes." 

      We edited this sentence in the abstract, as suggested.

      Comment 5. Results 

      There is a disconnect between the objectives outlined at the end of the introduction and the following results that should be improved. 

      The authors state: "Using high-frequency GPS and accelerometer data from wild barn owls (Tyto alba), we quantify the landing dynamics of this sit-and-wait strategy to (i) examine how birds adjust their landing force with the behavioral and environmental context and (ii) test the extent to which the magnitude of the predator cue affects hunting success." But one of the first results presented are sex differences. 

      This is a fair point. We have now changed our statement in the end of the introduction as well as the order of the results to improve the link between the objectives outlined in the introduction and the way result are presented. 

      Comment 6. At this stage, the reader does not even know yet that we are presented with a size-dimorphic species that also has very different parental roles during the breeding season. This should be better streamlined, with an extra paragraph in the introduction. And these sex differences are then not even discussed, so why bring them up in the first place (and not just state "sex has been fitted as additional co-variate to account for the size-dimorphism in the species" without further details). 

      We edited the way the objectives are outlined in the introduction to cover the size dimorphism (lines 70 – 76). We also completely changed the way the sex differences are presented in the results, including a new analysis that we believe provides a better comprehensive understanding of barn owl foraging behavior (lines 164 – 206). Finally, we added a new paragraph in the discussion to consider those results (lines 319 – 339).

      Comment 7. It is not clear to me where and how high-resolution GPS data were used? The results seem to concentrate on ACC – why GPS was used and how it features should be foreshadowed in a few lines in the introduction. I definitively prefer having the methods at the end of a manuscript, but with this structure, it is crucial to give the reader some help to understand the storyline. 

      GPS data were used to validate some behavioral classifications (prey provisioning for example), but most importantly they were used to link each landing event with perch types. We edited the text in the result section to clarify where GPS and/or ACC data were used.

      Comment 8. Discussion 

      Move the orca example further down, where more detail can be provided to understand the evidence. 

      After our extensive edits in the discussion, we felt this example was interrupting the flow. We now cite this study in the introduction. 

      Comment 9. Size dimorphism and evident sex differences are not discussed. 

      The revised manuscript now includes a new paragraph in the discussion in which sex differences are discussed (lines 319 – 339).

      Comment 10. Be more precise in the terminology used (for example, land use seems to be interchangeable with habitat characteristics?). 

      We modified “land use” with “habitat data” in the revised manuscript.

      Comment 11. Methods 

      Please provide a justification for the very high weight limit (5%; line 256). This limit is outdated and does not fulfill the international standard of 3% body weight. I assume the ethics clearance went through because of the short nature of the study (i.e., the birds were not burdened for life with the excess weight? But a line is needed here or under the ethics considerations to clarify this). 

      The 5% weight limit was considered acceptable due to the short deployment period, and we now edited the ethics statement to emphasize this point. However, it is important to note that there is no real international standard, with both 3% and 5% weight limits being commonly used. Both limits are arbitrary and the impact of a fixed mass on a bird varies with species and flight style. All owls survived and bred similarly to the non-tagged individuals in the population (lines 373 – 376 & lines 558 – 561)

      EDITORIAL COMMENT: We strongly encourage you to provide further context and clarification on this issue, as suggested by the Reviewer. On a related point, the ethics statement refers to GPS loggers, rather than GPS and ACC devices; we encourage you to clarify wording here.

      Thank you for highlighting this point that indeed needed some clarifications.

      Although we have used the terminology "GPS recorders", the authorization granted by the Swiss authorities for this study effectively covers the entire tracking system, which combines both GPS and ACC recorders in the same device. We have therefore changed the wording used in the ethics statement to avoid any misunderstanding (lines 373 – 376 & lines 558 – 561)

      Comment 12. Please provide more information on the model selection approach, what does "Non-significant terms were dropped via model simplification by comparing model AIC with and without terms." mean? Did the authors use a stepwise backward elimination procedure (drop1 function)? Or did they apply a complete comparison of several candidate models? I think a model comparison approach rather than stepwise selection would be more informative, as several rather than only one model could be equally probable. This might also improve model weights or might require a model averaging procedure - current reported R2values are very small and do not seem to support the results well. 

      We apologize for the lack of details about this important aspect of the statistical analysis. We applied an automated stepwise selection using the dredge function from the R package “MuMin”, therefore applying a complete comparison of several candidate models. The final models were chosen as the best models since the number of candidate models within ∆AIC<2 was relatively low in each analysis and thus a model averaging was not appropriate here. We edited the methods section to ensure clarity, and added model selection tables for each analysis, ranked according to AICc scores, in the supplementary materials (lines 532 – 552)

      In addition, we agree that the reported R-squared values in our analyses are quite low, specifically regarding the influence of pre-hunt perching force on hunting success (cond R2 = 0.04). Nonetheless, landing impact still has a notable effect size (an increase of 1N reduces hunting success by 15%). The reported values are indicative of the inherent complexity in studying hunting behavior in a wild setting where numerous variables come into play. We specifically investigated the hypothesis that the force involved during pre-hunt landings, and consequently the emitted noise, influences the success of the next hunting attempt in wild barn owls. Factors such as prey behavior and micro-habitat characteristics surrounding prey (such as substrate type and vegetation height) are most likely to be influential but hard, or nearly impossible, to model. We now cover this in a more nuanced way in the discussion (lines 266 – 268)

      Comment 13. Please explain why BirdID was nested in NightID - this is not clear to me.

      Probably here there is a misunderstanding because we wrote that we nested NightID in BirdID (and not BirdID in NightID). 

      Comment 14. I hope the final graphs and legends will be larger, they are almost impossible to read. 

      We enlarged the graphs and legends as much as possible to improve readability. However, looking at the graphs in the published version they seem clear and readable.

      Comment 15. Figure S1: Does "representation" mean the tracks don't show all of the 163 owls? If so, be precise and tell us how many are illustrated in the figure. 

      Figure S1 represent the tracks for each of the 163 barn owls used in the study. We changed the terminology used in the figure caption to avoid any misunderstanding.

      Comment 16. Figure S4: Please adjust the y-axis to a readable format. 

      Done

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer #1 comments:

      (1) SY1 aggregation enhances (in terms of number of aggregates) when Sphingolipid biosynthesis is blocked.

      a. Line no 132-133: I agree that there is circumstantial evidence that the maturation pathway of SY1 IB is perturbed by knocking down sphingolipid biosynthesis. However, to prove this formally, a time course of IB maturation needs to be reported in the knock-down strains.

      Please see Figure 2-figure supplement 1 for the time course of SY1 IB maturation in the knock-down strains. We have added the result to the manuscript, please see lines 129-131on page 5 in the revised version.

      b. It will be good to have formal evidence that sphingolipids are indeed downregulated when these genes are downregulated (knocked down).

      This issue has been clearly evidenced in previous reports, and we have added the appropriate references in the main text. For example, down-regulation of LCB1 or SPT in yeast decreased sphingolipid levels by Huang et al (https://doi.org/10.1371/journal.pgen.1002493). According to the report from Tafesse FG, et al (https://doi.org/10.1371/journal.ppat.1005188), in mammalian cells in which Sptlc2 was knocked down by CRISPR/Cas9, sphingolipid and glucosylceramide production is almost completely blocked. In addition, the levels of sphingosine, sphingomyelin, and ceramide were significantly lower compared to control cells. Please see lines 143-144 on pages 6 and lines 232-233 on pages 9 in the revised version.

      (2) In a normal cell (where sphingolipid biosynthesis is not hampered), the aggregate of SY1 (primarily the Class I aggregate) is localized only on the mitochondrial endomembrane system. These results have been published for other aggregation-prone proteins and are partly explained in the literature. However, their role in the context of maturation is relatively unclear. The authors however provide no strong evidence to show if mitochondria are preferentially involved in any of the stages of IB maturation. Specifically:

      a. Line 166-167: It is not clear from Figure 4B that this is indeed the case. Only the large IB seems to colocalize in all three panels (Class I, 2, 3) with Mitotracker. The smaller IBs in 2 and 3 do not show any obvious co-localization. It is also possible that they do co-localize, but it is not clear from the images. I would appreciate it if the authors either provide stronger evidence (better image) or revise this statement. This point is crucial in some claims made later in the manuscript. (pls see comment #5A).

      Based on the reviewer's suggestion, we replaced the images in Figure 4B. In addition, we added the 3D reconstruction results of the interrelationship between Class 3 and Mitotracker in Figure 4-figure supplement 1B, to further show their relationship.

      (3) The localization is due to the association of SY1 (aggregates) with mitochondrial proteins like Tom70, Tim44 etc. There are some critical points (that can strengthen the manuscript) that are not addressed here. Primarily, the important role of mitochondria in the context of toxicity is neglected. Although the authors have mentioned in the discussion that it was not their main focus, I believe that this is the novel part of the manuscript and this part is potentially a beautiful addition to literature. The questions I found unanswered are:

      a. Is the localization completely lost upon deleting these genes? I see only a partial loss in shape/localization. This is not properly explained in the manuscript. The shape of the IB seems to remain intact while the localization is slightly altered. This indicates that even when sphingolipid is present, SY1 localization is dictated by the (lipid-raft embedded) proteins. Interestingly, it shows that even in the absence of mitochondrial localization the shape of the aggregates is not altered in these deletion strains! How do the authors explain this if mitochondrial surface sphingolipids are important for IB maturation? (the primary screen found that sphingolipid biosynthesis promotes the formation of Class I IBs).

      We agree that mutation in one mitochondrial binding protein only a partial loss in shape/localization, and we have replaced “association” with “surrounding” in the manuscript. Please see lines 163-166 on page 6 in the revised version. In mutants that interact with SY1, we counted the proportion of Class 3 aggregates formed by SY1 and found an increase in the proportion of SY1 Class 3 aggregates in the deletion mutants compared to controls, partially lost interaction of SY1 with mitochondria has effect on shape of aggregates, as detailed in line 184 on page 7 and Figure 4-figure supplement 1D. We think that SY1 interactions with mitochondrial proteins are important for the localization of SY1 IB in mitochondria, whereas sphingolipids play an important role in facilitating the formation of Class 1 IBs from Class 3 aggregates.

      b. What happens to the toxicity when the aggregates are not localized on mitochondria?

      We thank the reviewer for the comments, however to investigate this issue, since a single mutant can only partially affect the phenotype, it may be necessary to construct groups of mutants of different genes to observe the effect, which we will further elucidate in our future studies. What we want to show in this work is that SY1 achieves binding to mitochondria by interacting with these mitochondrial proteins.

      c. It is important to note that sphingolipids may affect the whole process indirectly by altering pathways involved in protein quality control or UPR. UPR may regulate the maturation of IBs. It is therefore important to test if any of the effects seen could be of direct consequence.

      We agree with the reviewer's comments, but there was no significant enrichment for protein quality control or UPR-related pathways in our genome-wide screen, so it is unlikely that sphingolipids indirectly cause maturation of IBs by affecting these two pathways. We addressed this issue in our discussion. Please see lines 325-328 on page 12 in the revised version.

      d. In Figure 4D, the authors find SY1 when they pull down Tom70, Tom37 or Tim44. Tim44 is a protein found in the mitochondrial matrix, how do the authors explain that this protein is interacting with a protein outside the mitochondrial outer membrane?

      This interaction could be potentially due to that some of the soluble SY1 enter the mitochondrial matrix and interact with Tim44.

      e. Is it possible that the authors are immunoprecipitating SY1 since IBs have some amount of unimported mitochondrial proteins in aggregates formed during proteotoxic stress (https://doi.org/10.1073/pnas.2300475120) (Liu et al. 2023).

      Our Co-IP experiments were performed in the soluble state supernatant, so mitochondrial proteins in aggregates were not detected.

      f. Line 261 (Discussion): Does deletion of Tom70 or one of the anchors increase Class III aggregation and increase toxicity? Without this, it is hard to say if mitochondria are involved in detoxification.

      We thank the reviewer for the comments, please see our response to comment 3b.

      (4) This fuels the loss of mitochondrial function.

      a. Line 218-219: Although the change is significant, the percentage change is very slight. Is this difference enough to be of physiological relevance in mitochondrial function? In our hands, the DCF fluorescence is much more variable.

      We agree with the reviewer that there is a small difference (but significant). To which extend such a difference be of physiological relevance in mitochondrial function need to be further investigated.

      b. Is SY1-induced loss of mitochondrial function less in knockouts of Tom70 or the other ones found to be important for localizing the SY1 aggregate to mitochondria?

      We examined mitochondrial membrane potential (indicated by Rho 123 fluor intensity) in tom70Δ, tom37Δ and control his3Δ strains and found that the knocking out of Tom70 or Tom37 reduced the mitochondrial toxicity caused by SY1 expression. Please see lines 212-214 on page 8 in the revised version, and Figure 5-figure supplement 2.

      (5) Mitochondrial function is further abrogated when there is a block in sphingolipid biosynthesis.

      a. Myriosin acted like the deletion strains that showed less structured aggregates. There were more aggregates (Class 3) but visually they seemed to be spread apart. The first comment (#2A) on aggregate classes and their interaction with mitochondria may become relevant here.

      According to a recent review article (https://doi.org/10.3389/fcell.2023.1302472), sphingolipids are present in the mitochondrial membrane, bind to many mitochondrial proteins and have emerged as key regulators of mitochondrial morphology, distribution and function. Dysregulation of sphingolipid metabolism in mitochondria disrupts many mitochondrial processes, leading to mitochondrial fragmentation, impaired bioenergetics and impaired cellular function. Myriocin treatment, which affects sphingolipid metabolism, causes mitochondria to become more fragmented, which may explain why the aggregates appear visually spread apart. Regarding the interaction with mitochondria, we counted the proportion of SY1 aggregates surrounded by mitochondria after treatment with myriocin, and the results were not significantly different compared to the control. Please see lines 168-169 on page 6 in the revised version, and Figure 4-figure supplement 1C.

      (6) A similar phenomenon is conserved in mammalian cell lines.

      a. Line 225-226: Did the authors confirm that this was the only alteration in the genome? Or did they complement the phenotype, genetically?

      We performed SPTLC2 gene complementation experiments in knockout cell lines and found that SPTLC2 gene complementation was able to reduce the number of cells forming IBs and the percentage of dispersed irregular IBs compared to controls. Please see lines 240-242 on page 9 in the revised version, and Figure 6-figure supplement 2B.

      b. Line 241-245: One of the significant phenotypes observed by downregulating sphingolipid biosynthesis in yeast and mammalian cells, was the increase in the number of aggregates. This is not shown in myriocin treatment in mammalian cells. This needs to be shown to the main concordance with the original screen and the data presented with the KO mammalian cell line.

      Please see Figure 7-figure supplement 1A for the data on the proportion of cells forming SY1 IBs after myriocin treatment in mammalian cells, and myriocin treatment in mammalian cells was the same as in the KO mammalian cell line.

      Minor Comments:

      Line 273-275: How is this statement connected to the previous statement? Was it observed that aggregate fusion was advantageous to the cells?

      Yes, aggregate/oligomer fusion is advantageous to the cells, and we have modified the previous statement. Please see line 280 on page 10 in the revised version.

      Line 293-294: I am not sure I understand this statement.

      We have modified this statement. Please see lines 302-303 on page 11 in the revised version.

      Line 295-296: But the authors have commented at multiple places that mitochondria detoxify the cell from SY1 aggregates. I find this link fascinating and worth investigating. Most of the current work has some known links in literature (not everything). The mitochondrial connection being the most fascinating one.

      We have removed this sentence. We have added a validation experiment for the role of mitochondrial activity in SY1 IB maturation in the revised version.

      Line 318: Do the authors mean: The open question is...

      Thanks to the reviewer, we have corrected it.

      Response to Reviewer #2 comments:

      I recommend considering live cell microscopy to analyze whether sphingolipid-dependent formation of SY1 IB takes place at the mitochondrial outer membrane. The IBs could also be produced at other membranes and then transported to the mitochondrial outer membrane for storage.

      As shown in Figure 4A, SY1 IB primarily interacts with mitochondria.

      I recommend analyzing whether mitochondrial activity is needed for sphingolipid-dependent SY1 IB formation. Are these IBs localized to mitochondrial membrane solely as scaffold or are these organelles needed to provide the energy for driving IB formation in concert with sphingolipids? This point could be addressed with rho0 strains lacking mitochondrial DNA.

      We thank the reviewer for this recommendation. We expressed SY1 protein in BY4741 rho0 strain as suggested and found that the maturation and mitochondrial surrounding state of SY1 IB was not affected by mitochondrial activity. Please see lines 185-187 on page 7 in the revised version, and Figure 4-figure supplement 1E and 1F.

      The authors should be more precise in the statistical methods used in their study (method, pre-/post-tests, number of replicates...).

      We thank the reviewer for the comment and we have provided a more precise description of the statistical methods. Please see lines 531-534 on page 19 and figure legends in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting study that utilizes a novel epigenome profiling technology (single molecule imaging) in order to demonstrate its utility as a readout of therapeutic response in multiple DIPG cell lines. Two different drugs were evaluated, singly and in combination. Sulfopin, an inhibitor of a component upstream of the MYC pathway, and Vorinostat, an HDAC inhibitor. Both drugs sensitized DIPG cells, but high (>10 micromolar) concentrations were needed to achieve half-maximal effects. The combination seemed to have some efficacy in vivo, but also produced debilitating side-effects that precluded the measurement of any survival benefit.

      We thank the reviewer for deeply evaluating our work and acknowledging the use of multiple experimental strategies to explore the effect of combination therapy on DMG cells. Of note, all mice in our experiment experienced deterioration (including the control mice and those treated with single agents). Thus, it is not the combination of drugs that led to the debilitating side-effects; the mice deteriorated due to the extremely aggressive tumor cells, forming relatively large tumors prior to the treatment onset, calling for further optimization of the therapeutic regime.

      We modified the text in the results section to clarify this point (lines 238-241): “This rapid deterioration is likely a result of the aggressiveness of the transplanted tumors and does not represent side effects of the treatment, as mice from all groups, including the non-treated mice, showed similar signs of deterioration”.  

      We also elaborate on this in the discussion (lines 272-276): “Notably, despite a significant reduction in tumor size in-vivo, the combined treatment did not increase mice survival. This is perhaps due to the relatively large tumors already formed at the onset of treatment, leading to rapid deterioration of mice in all experimental groups. Thus, further optimization of the modeling system and therapeutic regime is needed.” We truly hope that further studies will allow better assessment of this drug combination in various models.

      Strengths:

      Interesting use of a novel epigenome profiling technology (single molecule imaging).

      Weaknesses:

      The use of this novel imaging technology ultimately makes up only a minor part of the study. The rest of the results, i.e. DIPG sensitivity to HDAC and MYC pathway inhibition, have already been demonstrated by others (Grasso Monje 2015; Pajovic Hawkins 2020, among others). The drugs have some interesting opposing effects at the level of the epigenome, demonstrated through CUT&RUN, but this is not unexpected in any way. The drugs evaluated here also didn't have higher efficacy, or efficacy at especially low concentrations, than inhibitors used in previous reports. The combination therapy attempted here also caused severe side effects in mice (dehydration/deterioration), such that an effect on survival could not be determined. I'm not sure this study advances knowledge of targeted therapy approaches in DIPGs, or if it iterates on previous findings to deliver new, or more efficient, mechanistic or therapeutic/pharmaclogic insights. It is a translational report evaluating two drugs singly and in combination, finding that although they sensitise cells in vitro, efficacy in vivo is limited at best, as this particular combination cannot progress to human translation.

      We thank the reviewer for pointing out the strengths and weaknesses of our work. As far as we know, while many studies demonstrated upregulation of the MYC pathway in DIPG, this is the first study that shows inhibition of this pathway (via PIN1) as a therapeutic strategy. While it is clear from the literature that MYC inhibition may pose therapeutic benefit, the development of potent MYC inhibitors is highly challenging due to its structure and cellular localization. Of note, in the 2020 paper, Pajovic and colleagues inhibited MYC by transfecting the cells with a plasmid expressing a specific inhibitory MYC peptide (Omomyc); while this strategy works well for cell cultures, the clinical translation requires different delivery strategies. Sulfopin is a small molecule inhibitor that can be used in-vivo and potentially in clinical studies. Thus, we believe that our study offers a novel strategy, as well as mechanistic insights, regarding the potential use of Sulfopin and Vorinostat to treat DIPG.

      As noted above, the combination therapy did not cause side effects, but rather the aggressiveness of the tumors. We did not notice specific toxicity in the mice treated with Sulfopin alone, or the combined treatment. Furthermore, Dubiella et al. extensively examined toxicity issues and did not observe adverse effects or weight loss when administrating Sulfopin in a dose of 40 mg kg–1.

      Optimization of the model and treatment regime (# of cells injected, treatment starting point, etc.) may have allowed us to reveal survival benefits. Yet, these are highly complicated and expensive experiments; unfortunately, we did not have the resources to perform them within the scope of this revision. Importantly, within the current manuscript, we show the effect of this drug combination in reducing the growth of DMG cells in-vitro and in-vivo, laying the framework for follow-up exploration in future studies. Furthermore, the epigenetic and transcriptomic profiling shed light on the molecular mechanisms that drive these aggressive tumors.

      Reviewer #2 (Public Review):

      Summary:

      The study by Algranati et al. introduces an exciting and promising therapeutic approach for the treatment of H3-K27M pediatric gliomas, a particularly aggressive brain cancer predominantly affecting children. By exploring the dual targeting of histone deacetylases (HDACs) and MYC activation, the research presents a novel strategy that significantly reduces cell viability and tumor growth in patient-derived glioma cells and xenograft mouse models. This approach, supported by transcriptomic and epigenomic profiling, unveils the potential of combining Sulfopin and Vorinostat to downregulate oncogenic pathways, including the mTOR signaling pathway. While the study offers valuable insights, it would benefit from additional clarification on several points, such as the rationale behind the dosing decisions for the compounds tested, the specific contributions of MYC amplification and H3K27me3 alterations to the observed therapeutic effects, and the details of the treatment protocols employed in both in-vitro and in-vivo experiments.

      We thank the reviewer for evaluating our work and recognizing its potential for the DMG research field. We address in detail below the important comments regarding the treatment protocols and dosing decisions.

      Clarification is needed on how doses were selected for the compounds in Figure S2A and throughout the study. Understanding the basis for these choices is crucial for interpreting the results and their potential clinical relevance. IC50s are calculated for specific patient derived lines, but it is not clear how these are used for selecting the dose.

      We thank the reviewer for these important comments. For the epigenetic drugs shown in Figure S2A, we followed published experimental setups; for EPZ6438, GSKJ4, Vorinostat and MM-102 we chose the treating concentrations according to Mohammad et al. 2017, Grasso et al. 2015 and Furth et al. 2022, accordingly. For Sulfopin, we conducted a dedicated dose curve analysis (shown in Figure 1E), indicating only a mild effect on viability and relatively high IC-50 values as a single agent. Since we aimed to test the ability of a combined treatment to additively reduce viability, we used a sub-IC50 concentration for Sulfopin in these experiments. We added this information in lines 123 and 131-132.

      Finally, following the results obtained in the experiment shown in Figure S2A, we conducted a full dose-curve analysis of the combined treatment in multiple DMG patient-derived cells (figure 2B and S2C), to identify a combination of concentrations that provides an additive effect (as indicated by BLISS index in figure 2C and S2E). Of note, for downstream analysis of the molecular mechanisms underlying the treatment response (RNAseq and Cut&Run), we intentionally used concentrations that provide an additive BLISS index, but do not completely abolish the culture, to allow for cellular analysis (i.e. 10uM Sulfopin and 1uM Vorinostat).

      The introduction mentions MYC amplification in high-grade gliomas. It would be beneficial if the authors could delineate whether the models used exhibit varying degrees of MYC amplification and how this factor, alongside differences in H3K27me3, contributes to the observed effects of the treatment.

      The reviewer highlights an important part of our study relating to the MYC-dependent sensitivity of the proposed treatment combination. Since high expression of MYC can be mediated by different molecular mechanisms and not only genomic amplification, we directly quantified mRNA levels of MYC by qPCR (shown in figure S2G) in order to explore its relationship with cellular response to Sulfopin and Vorinostat. Indeed, cultures that express high levels of MYC mRNA were more sensitive to Sulfopin treatment alone (figure S1P) and to the combined treatment (figure 2D-E). We also relate to these findings in lines 103-106 and 142-147 of the results section. Importantly, in cultures that express high levels of MYC (SU-DIPG13 as an example), we see downregulation of MYC targets upon the combined treatment, supporting the notion that this treatment affects viability by attenuation of MYC signaling.

      In Figure 2A, the authors outline an optimal treatment timing for their in vitro models, which appears to be used throughout the figure. It would be helpful to know how this treatment timing was selected and also why Sulfopin is dosed first (and twice) before the vorinostat. Was this optimized?

      As PIN1 regulates the G2/M transition, its inhibition by Sulfopin delays cell cycle progression (Yeh et al. 2007). Thus, in order to observe a strong viability difference in culture, a prolonged treatment period of 8-9 days is required (Dubiella et al., 2021). To maintain an active concentration of the drug during this long time period, we added a Sulfopin pulse (2nd dose) to achieve a stronger effect on cell viability. We and others noticed that, unlike Sulfopin, the effect of Vorinostat on viability is rapid and can be clearly seen after 2-3 days of treatment. Thus, we added this drug only after the 2nd dose of Sulfopin. We now relate to the mode of action of Sulfopin in lines 79-81.

      It should be clarified whether the dosing timeline for the combination drug experiments in Figure 3 aligns with that of Figure 2. This information is also important for interpreting the epigenetic and transcriptional profiling and the timing should be discussed if they are administered sequentially (also shown in Figure 2A).I have the same question for the mouse experiments in Figure 4.

      The reviewer is correct that this information is critical for evaluating the results. In order to link the expression changes to the epigenetic changes, we kept the same experimental conditions in both the Cut&Run and RNA-seq experiments (shown in figures 2-3). We added this information to the text in line 184.

      For the in-vivo studies of HDAC inhibition (Figure 4), we followed published protocols (Ehteda et al. 2021). In these experiments both drugs were administrated simultaneously every day. We added this information to the text in line 231-232.  It may be that changing the admission regime may improve the efficacy of the drug combination, which remains to be tested in future studies.

      The authors mention that the mice all had severe dehydration and deterioration after 18 days. It would be helpful to know if there were differences in the side effects for different treatment groups? I would expect the combination to be the most severe. This is important in considering the combination treatment.

      As noted in our response to Reviewer #1, all mice in our experiment experienced deterioration (including the control mice and those treated with single agents- we could not observe any differences between the groups). This is due to the extremely aggressive tumor cells, forming relatively large tumors prior to the treatment onset, calling for further optimization of the system and therapeutic regime (# of cell injected, treatment starting point, etc.). Unfortunately, this model is very challenging (especially the injection of cells to the pons of the mice brains, which requires unique expertise and is associated with mortality of some of the mice). Thus, these are highly complicated and expensive experiments; unfortunately, we did not have the resources to repeat and optimize the treatment protocol within the scope of this revision. Of note, Dubiella et al. extensively examined toxicity issues and did not observe adverse effects or weight loss when administrating Sulfopin in a dose of 40 mg kg–1. In our model, the side effects were caused by the tumors rather than the drugs.

      Minor Points:

      (1) For Figure 1F, reorganizing the bars to directly compare the K27M and KO cell lines at each dose would improve readability of this figure.

      We have changed figure 1F as the reviewer suggested.

      (2) In Figure 4D, it would be helpful to know how many cells were included (or a minimum included) to calculate the percentages.

      We added the number of H3-K27M positive cells detected per FOV to the figure legend and method section (n=13-198 cells per FOV). Of note, while we analyzed similar-sized FOVs, the number of tumor cells varied between the groups, with the treated group presenting a lower number of H3-K27M cells (due to the effect of the treatment on tumor growth). To account for this difference, we calculated the portion of mTOR-positive cells out of the tumor cells.   

      Reviewer #3 (Public Review):

      Summary:

      The authors use in vitro grown cells and mouse xenografts to show that a combination of drugs, Sulfopin and Vorinostat, can impact the growth of cells derived from Diffuse midline gliomas, in particular the ones carrying the H3 K27M-mutations (clinically classified as DMG, H3 K27M-mutant). The authors use gene expression studies, and chromatin profiling to attempt to better understand how these drugs exert an effect on genome regulation. Their main findings are that the drugs reduce cell growth in vitro and in mouse xenografts of patient tumours, that DMG, H3 K27M-mutant tumours are particularly sensitive, identify potential markers of gene expression underlying this sensitivity, and broadly characterize the correlations between chromatin modification changes and gene expression upon treatment, identifying putative pathways that may be affected and underlie the sensitive (and thus how the drugs may affect the tumour cell biology).

      Strengths:<br /> It is a neat, mostly to-the-point work without exploring too many options and possibilities. The authors do a good job not overinterpreting data and speculating too much about the mechanisms, which is a very good thing since the causes and consequences of perturbing such broad epigenetic landscapes of chromatin may be very hard to disentangle. Instead, the authors go straight after testing the performance of the drugs, identifying potential markers and characterizing consequences.

      Weaknesses:

      If anything, the experiments done on Figure 3 could benefit from an additional replicate.<br />

      We thank the reviewer for evaluating our work, and for the positive and insightful comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Perhaps a more substantial drug screen, or CRISPR screen, that utilises single molecule imaging as a readout would identify pharmacologic candidates that are either more effective, or novel.

      While out of scope for the current study, this is a highly interesting suggestion, which will be considered in future studies. Here, we focused on the potential use of the novel MYC inhibitor, Sulfopin. While the dependency of DMG cells on MYC signaling has been documented, to the best of our knowledge, pharmacological inhibition of MYC has not been tested for this disease due to the severe lack of potent MYC inhibitors. We show preliminary evidence for the use of this inhibitor, in combination with HDAC inhibition, to attenuate DMG growth in-vitro and in-vivo.  

      Reviewer #2 (Recommendations For The Authors):

      In Figure 1B, it is hard to tell if there are error bars for HSP90 and E2F2. Is there a potential error here? Seems unlikely to not have an error with a RT-qPCR?

      We thank the reviewer for the careful evaluation of the figures. We included error bars for all genes shown in Figure 1B. We have now increased the line width with the hope of making this information more accessible. As stated in the figure legend, these error bars represent the standard deviation of two technical repeats.

      I noticed that many experiments only had technical replicates. Incorporating biological (independent) replicates, where feasible, would strengthen the study's findings.

      We agree with the reviewer regarding the importance of biological replicates. While some of the panels present error estimates based on technical repeats, the main results were repeated independently with complementary approaches or various biological systems for validation.

      The RNAseq analysis presented in figure 1 was conducted in triplicates and then independently validated by qPCR (Figure 1A-B). Similarly, the transcriptomic analysis presented in figures 2G-I was verified by both western blot (figure 2J) and qPCR (figure S2O). Of note, this later validation was conducted for two different DMG-patient derived cultures.

      To verify the robust effects on cellular viability, we analyzed the response to each drug and the combination on eight different DMG-patient-derived cultures, each representing a completely independent experiment. We show very similar trends in response to treatment between cultures that share the same H3-K27M variant. Thus, while for each culture technical repeats are shown, we provide multiple, independent repeats by examining the different cultures. Similarly, in figure 1F we examined the dependency of Sulfopin treatment on the expression of the H3-K27M oncohistone in two independent isogenic systems.

      Reviewer #3 (Recommendations For The Authors):

      A few questions and suggestions:

      (1) To avoid confusion is important to state if the cells used in each experiment are or not K27M mutants (e.g. SU-DIPG13 on line 63).

      We thank the reviewer for pointing this out and have now added this information when appropriate across the manuscript.

      2) Line 72 - confirming epigenetic silencing of these genes upon PIN1 inhibition (Fig. 1C, S1D)

      Considering that the mechanism of down regulation of MYC targets is likely H3K27me3-independent if it is also happening in DMG H3 K27M-mutants (high H3K27me3 here may rather be a consequence of less MYC binding?), I would strike this sentence out and just point out the correlation between lower expression and higher H3K27me3.

      We agree with the reviewer that the exact molecular mechanism mediating the silencing is yet to be characterized. We have modified the text in line 72 accordingly.

      3) (line 78) Are MYC targets also down regulated in Sulfopin treated DMG, H3 K27M-mutant lines? Any qPCR or previously done RNA-seq data to use?

      In addition to the extensive analysis done on SU-DIPG13 cells (Figure 1 and S1), in light of the reviewer`s comment we examined specific MYC targets in an additional H3-K27M mutant DMG culture (SU-DIPG6) treated with Sulfopin, followed by qPCR. We observed a mild reduction in two prominent targets, E2F2 and mTOR (new figure S1D). Unfortunately, within this study, we only conducted full RNA-sequencing analysis on SU-DIPG13 cells treated with Sulfopin, and thus, we could not examine the global effect of Sulfopin on the transcriptome of other DMG cultures. This will, of course, be of high interest for future studies.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript examines the contribution of the dorsal and intermediate hippocampus to goal-directed navigation in a wide virtual environment where visual cues are provided by the scenery on the periphery of a wide arena. Among a choice of 2 reward zones located near the arena periphery, rats learn to navigate from the center of the arena to the reward zone associated with the highest reward. Navigation performance is largely assessed from the rats' body orientation when they leave the arena center and when they reach the periphery, as well as the angular mismatch between the reward zone and the site rats reach the periphery. Muscimol inactivation of the dorsal and intermediate hippocampus alters rat navigation to the reward zone, but the effect was more pronounced for the inactivation of the intermediate hippocampus, with some rat trajectories ending in the zone associated with the lowest reward. Based on these results, the authors suggest that the intermediate hippocampus is critical, especially for navigating to the highest reward zone.

      Strengths:

      -The authors developed an effective approach to study goal-directed navigation in a virtual environment where visual cues are provided by the peripheral scenery.

      - In general, the text is clearly written and the figures are well-designed and relatively straightforward to interpret, even without reading the legends.

      - An intriguing result, which would deserve to be better investigated and/or discussed, was that rats tended to rotate always in the counterclockwise direction. Could this be because of a hardware bias making it easier to turn left, some aspect of the peripheral landscape, or a natural preference of rats to turn left that is observable (or reported) in a real environment?

      Thank you for the insightful question. As the reviewer mentioned, the counterclockwise rotation behavior was intriguing and unexpected. To answer the reviewer’s question properly, we examined whether such stereotypical turning behavior appeared before the rats acquired the task rule and reward zones in the pre-surgical training phase of the task. Data from the last day of shaping and the first day of the pre-surgical main task day showed no significant difference in the number of trials in which the first body-turn was either clockwise or counterclockwise, suggesting that the rats did not have a bias toward a specific side (p=0.46 for Shaping; p=0.76 for the Main task, Wilcoxon signed-rank test). These results excluded the possibility that there was something in the apparatus's hardware that made the rats turn only to the left. Also, since we used the same peripheral landscape for the shaping and main task, we could assume that the peripheral landscape did not cause movement bias.

      Author response image 1.

      Although it remains inconclusive, we have noticed that some prior studies alluded to a phenomenon similar to this issue, framed as the topic of lateralization or spatial preference by comparing left and right biases. For example, Wishaw et al. (1992) suggested that there was natural lateralization in rats (“Most of the rats displayed either a strong right limb bias or a strong left limb bias.”) but no dominance to a specific side. Andrade et al. (2001) also claimed that “83% of Wistar rats spontaneously showed a clear preference for left or right arms in the T-maze.” However, to the best of our knowledge, there has been no direct evidence that rats have a dominant natural preference only to one side.

      Therefore, while the left-turning behavior remains an intriguing topic for further investigation, we find it difficult to pinpoint the reason behind the behavior in the current study. However, we would like to emphasize that this behavior did not interrupt testing our hypothesis. Nonetheless, we agree with the reviewer’s point that the counterclockwise rotation needs to be discussed more, so we revised the manuscript as follows:

      “To rule out the potential effect of hardware bias or any particular aspect of peripheral landscape to make rats turn only to one side, we measured the direction of the first body-turn in each trial on the last day of shaping and the first day of the main task (i.e., before rats learned the reward zones). There was no significant difference between the clockwise and counterclockwise turns (p=0.46 for shaping, p=0.76 for main task; Wilcoxon signed-rank test), indicating that the stereotypical pattern of counterclockwise body-turn appeared only after the rats learned the reward locations.” (p.6)

      - Another interesting observation, which would also deserve to be addressed in the discussion, is the fact that dHP/iHP inactivations produced to some extent consistent shifts in departing and peripheral crossing directions. This is visible from the distributions in Figures 6 and 7, which still show a peak under muscimol inactivation, but this peak is shifted to earlier angles than the correct ones. Such change is not straightforward to interpret, unlike the shortening of the mean vector length.

      Maybe rats under muscimol could navigate simply by using the association of reward zone with some visual cues in the peripheral scene, in brain areas other than the hippocampus, and therefore stopped their rotation as soon as they saw the cues, a bit before the correct angle. While with their hippocampus is intact, rats could estimate precisely the spatial relationship between the reward zone and visual cues.

      We agree with the possibility suggested by the reviewer. However, although not described in the original manuscript, we performed several different control experiments in a few rats using various visual stimulus manipulations to test how their behaviors change as a result. One of the experiments was the landmark omission test, where one of the landmarks was omitted. The landmark to be made disappear was pseudorandomly manipulated on a trial-by-trial basis. We observed that the omission of one landmark, regardless of its identity, did not cause a specific behavioral change in finding the reward zones, suggesting that the rats were not relying on a single visual landmark when finding the reward zone.

      Author response image 2.

      Therefore, it is unlikely that rats used the spatial relationship between the reward zone and a specific visual cue to solve the task in our study. However, the result was based on an insufficient sample size (n=3), not permitting any meaningful statistical testing. Thus, we have now updated this information in the manuscript as an anecdotal result as follows:

      “Additionally, to investigate whether the rats used a certain landmark as a beacon to find the reward zones, we conducted the landmark omission test as a part of control experiments. Here, one of the landmarks was omitted, and the landmark to be made disappear was pseudorandomly manipulated on a trial-by-trial basis. The omission of one landmark, regardless of its identity, did not cause a specific behavioral change in finding the reward zones, suggesting that the rats were not relying on a single visual landmark when finding the reward zones. The result can be reported anecdotally only because of an insufficient sample size (n=3), not permitting any meaningful statistical testing.” (p.9)

      Weaknesses:

      -I am not sure that the differential role of dHP and iHP for navigation to high/low reward locations is supported by the data. The current results could be compatible with iHP inactivation producing a stronger impairment on spatial orientation than dHP inactivation, generating more erratic trajectories that crossed by chance the second reward zone.

      To make the point that iHP inactivation affects the disambiguation of high and low reward locations, the authors should show that the fraction of trajectories aiming at the low reward zone is higher than expected by chance. Somehow we would expect to see a significant peak pointing toward the low reward zone in the distribution of Figures 6-7.

      We thank the reviewer for the valuable comments. We agree that it is difficult to rigorously distinguish the loss of value representation from spatial disorientation in our experiment. Since the trial ended once the rat touched either reward zone, it was difficult to specify whether they intended to arrive at the location or just moved randomly and arrived there by chance. Moreover, it is possible that the drug infusion did not completely inactivate the iHP but only partially did so.

      To investigate this issue further, we checked whether the distribution of the departure direction (DD) differed between the trials in which rats initially headed north (NW, N, NE) and south (SE, S, SW) at the start. In the manuscript, we demonstrated that DD aligned with the high-value zone, indicating that the rat remembered the scenes associated with the high-value zone (p.8). Based on the rats’ characteristic counterclockwise rotation, the reward zone rats would face first upon starting while heading north would be the high-value zone. On the other hand, the rat would face the low-value reward zone when starting while heading south. In this case, normal rats would inhibit leaving the start zone and rotate further until they face the high-value zone before finally departing the start location. If the iHP inactivation caused a more severe impairment in spatial orientation but not in value representation, it is likely that the iHP-inactivated rats in both north- and south-starting trials would behave similarly with the dHP-inactivated rats, but producing a larger deviation from the high-value zone. However, if the iHP inactivation affected the disambiguation of high and low reward locations, north and south-starting trials would show different DD distributions.

      The circular plots shown below are the DD distributions of dMUS and iMUS. We could see that when they started facing north, iHP-inactivated rats still aligned themselves towards the high-value zone and thus remained spatially oriented, similar to the dHP inactivation session. However, in the south-starting trials, the DD distribution was completely different from the north-starting trials; the rats failed in body alignment towards the high-value zone. Instead, they departed the start point while heading south in most trials. This pattern was not seen in dMUS sessions, even in their south-starting trials, illustrating the distinct deficit caused by iHP inactivation. Additionally, most of the rats with iHP inactivation visited the low-value zone more in south-headed starting trials than in the north-headed trials, except for one rat.

      Author response image 3.

      Furthermore, we would like to clarify that we do not limit the effect of iHP inactivation to the impairment in distinguishing the high and low reward zones. It is possible that iHP inactivation resulted in the loss of a global value-representing map, leading to the impairment in distinguishing both reward zones from other non-rewarded areas in the environment. Figures 6 and 7 implicated this possibility by showing that the peaks are not restricted only to the reward zones. Unfortunately, we cannot rigorously address this in the current study because of the limitations of our experimental design mentioned above.

      Nonetheless, we agree with the reviewer that this limitation needs to be addressed, so we now added how the current study needs further investigation to clarify what causes the behavioral change after the iHP inactivation in the Limitations section (p.21).

      Reviewer #2 (Public Review):

      Summary:

      The aim of this paper was to elucidate the role of the dorsal HP and intermediate HP (dHP and iHP) in value-based spatial navigation through behavioral and pharmacological experiments using a newly developed VR apparatus. The authors inactivated dHP and iHP by muscimol injection and analyzed the differences in behavior. The results showed that dHP was important for spatial navigation, while iHP was critical for both value judgments and spatial navigation. The present study developed a new sophisticated behavioral experimental apparatus and proposed a behavioral paradigm that is useful for studying value-dependent spatial navigation. In addition, the present study provides important results that support previous findings of differential function along the dorsoventral axis of the hippocampus.

      Strengths:

      The authors developed a VR-based value-based spatial navigation task that allowed separate evaluation of "high-value target selection" and "spatial navigation to the target." They were also able to quantify behavioral parameters, allowing detailed analysis of the rats' behavioral patterns before and after learning or pharmacological inactivation.

      Weaknesses:

      Although differences in function along the dorsoventral axis of the hippocampus is an important topic that has received considerable attention, differences in value coding have been shown in previous studies, including the work of the authors; the present paper is an important study that supports previous studies, but the novelty of the findings is not that high, as the results are from pharmacological and behavioral experiments only.

      We appreciate the reviewer's insightful comments. In response, we would like to emphasize that a very limited number of studies investigated the function of the intermediate hippocampus, especially in spatial memory tasks. We tested the differential functions of the dorsal and intermediate hippocampus using a within-animal design and used reversible inactivation manipulation (i.e., muscimol injection) to prevent potential compensation by other brain regions when using irreversible manipulation techniques (i.e., lesion). Also, very few studies have analyzed the navigation trajectories of animals as closely as in the current study. We emphasize the novelty of our study by comparing it with prior studies, as shown below in Table 1.

      Author response table 1.

      Comparison of our study with those from prior studies

      Moreover, to the best of our knowledge, the current manuscript is the first to investigate the hippocampal subregions along the long axis in a VR environment using a hippocampal-dependent spatial memory task. Nonetheless, we agree that the current study has a limitation as a behavior-only experiment. We now have added a comment on how other techniques, such as electrophysiology, would develop our findings in the Limitation section (p.21).

      Reviewer #3 (Public Review):

      Summary:

      The authors established a new virtual reality place preference task. On the task, rats, which were body-restrained on top of a moveable Styrofoam ball and could move through a circular virtual environment by moving the Styrofoam ball, learned to navigate reliably to a high-reward location over a low-reward location, using allocentric visual cues arranged around the virtual environment.

      The authors also showed that functional inhibition by bilateral microinfusion of the GABA-A receptor agonist muscimol, which targeted the dorsal or intermediate hippocampus, disrupted task performance. The impact of functional inhibition targeting the intermediate hippocampus was more pronounced than that of functional inhibition targeting the dorsal hippocampus.

      Moreover, the authors demonstrated that the same manipulations did not significantly disrupt rats' performance on a virtual reality task that required them to navigate to a spherical landmark to obtain reward, although there were numerical impairments in the main performance measure and the absence of statistically significant impairments may partly reflect a small sample size (see comments below).

      Overall, the study established a new virtual-reality place preference task for rats and established that performance on this task requires the dorsal to intermediate hippocampus. They also established that task performance is more sensitive to the same muscimol infusion (presumably - doses and volumes used were not clearly defined in the manuscript, see comments below) when the infusion was applied to the intermediate hippocampus, compared to the dorsal hippocampus, although this does not offer strong support for the authors claim that dorsal hippocampus is responsible for accurate spatial navigation and intermediate hippocampus for place-value associations (see comments below).

      Strengths:

      (1) The authors established a new place preference task for body-restrained rats in a virtual environment and, using temporary pharmacological inhibition by intra-cerebral microinfusion of the GABA-A receptor agonist muscimol, showed that task performance requires dorsal to intermediate hippocampus.

      (2) These findings extend our knowledge about place learning tasks that require dorsal to intermediate hippocampus and add to previous evidence that, for some place memory tasks, the intermediate hippocampus may be more important than other parts of the hippocampus, including the dorsal hippocampus, for goal-directed navigation based on allocentric place memory.

      (3) The hippocampus-dependent task may be useful for future recording studies examining how hippocampal neurons support behavioral performance based on place information.

      Weaknesses:

      (1) The new findings do not strongly support the authors' suggestion that the dorsal hippocampus is responsible for accurate spatial navigation and the intermediate hippocampus for place-value associations.

      The authors base this claim on the differential effects of the dorsal and intermediate hippocampal muscimol infusions on different performance measures. More specifically, dorsal hippocampal muscimol infusion significantly increased perimeter crossings and perimeter crossing deviations, whereas dorsal infusion did not significantly change other measures of task performance, including departure direction and visits to the high-value location. However, these statistical outcomes offer only limited evidence that dorsal hippocampal infusion specifically affected the perimeter crossing, without affecting the other measures. Numerically the pattern of infusion effects is quite similar across these various measures: intermediate hippocampal infusions markedly impaired these performance measures compared to vehicle infusions, and the values of these measures after dorsal hippocampal muscimol infusion were between the values in the intermediate hippocampal muscimol and the vehicle condition (Figures 5-7). Moreover, I am not so sure that the perimeter crossing measures really reflect distinct aspects of navigational performance compared to departure direction and hit rate, and, even if they did, which aspects this would be. For example, in line 316, the authors suggest that 'departure direction and PCD [perimeter crossing deviation] [are] indices of the effectiveness and accuracy of navigation, respectively'. However, what do the authors mean by 'effectiveness' and 'accuracy'? Accuracy typically refers to whether or not the navigation is 'correct', i.e. how much it deviates from the goal location, which would be indexed by all performance measures.

      So, overall, I would recommend toning down the claim that the findings suggest that the dorsal hippocampus is responsible for accurate spatial navigation and the intermediate hippocampus for place-value associations.

      The reviewer mentioned that the statistical outcomes offer limited evidence as the dHP inactivation results were always positioned between the results of the iHP inactivation and controls. However, we would like to emphasize that, projecting to each other, the two subregions are not completely segregated anatomically. It is highly likely this is also true functionally and there should be some overlap in their roles. Considering such relationships between the dHP and iHP, it could be natural to see an intermediate effect after inactivating the dHP, and that is why we focused on the “magnitude” of behavioral changes after inactivation instead of complete dissociation between the two subregions in our manuscript. Unfortunately, because of the nature of the drug infusion study, further dissociation would be difficult, requiring further investigation with different experimental techniques, such as physiological examinations of the neural firing patterns between the two regions. We mentioned this caveat of the current study in the Limitations as follows:

      “However, our study includes only behavioral results and further mechanistic explanations as to the processes underlying the behavioral deficits require physiological investigations at the cellular level. Neurophysiological recordings during VR task performance could answer, for example, the questions such as whether the value-associated map in the iHP is built upon the map inherited from the dHP or it is independently developed in the iHP.” (p.21)

      Regarding the reviewer’s comment on the meaning of measuring the perimeter crossing directions, we would like to draw the reviewer’s attention to the individual trajectories during the iMUS sessions described in Figure 5. Particularly when they were not confident with the location of the higher reward, rats changed their heading directions during the navigation, which resulted in a less efficient route to the goal location. Rats showing this type of behavior tended to hit the perimeter of the arena first before correcting their routes toward the goal zone. In contrast, rats showing effective navigation hardly bumped into the wall or perimeter before hitting the goal zone. Thus, their PCDs matched DDs almost always. When considered together with DD, our PCD measure could tell whether rats not hitting the goal zone directly after departure were impaired in either maintaining the correct heading direction to the goal zone at the start location or orienting themselves to the target zone accurately from the start. Our results suggest that the latter is the case. We included the relevant explanation in the Discussion section as follows:

      “Particularly, rats changed their heading directions during the navigation when they were not confident with the location of the higher reward, resulting in a less efficient route to the goal location. Rats showing this type of behavior tended to hit the perimeter of the arena first before correcting their routes. Therefore, when considered together with DD, our PCD measure could tell that the rats not hitting the goal zone directly after departure were impaired in orienting themselves to the target zone accurately from the start, not in maintaining the correct heading direction to the goal zone at the start location.” (p.19)

      Nonetheless, we agree with the reviewer that the term ‘accuracy’ might be confusing with performance accuracy, so we replaced the term with ‘precision’ throughout the manuscript, referring to the precise targeting of the reward zones.

      (2) The claim that the different effects of intermediate and dorsal hippocampal muscimol infusions reflect different functions of intermediate and dorsal hippocampus rests on the assumption that both manipulations inhibit similar volumes of hippocampal tissue to a similar extent, but at different levels along the dorso-ventral axis of the hippocampus. However, this is not a foregone conclusion (e.g., drug spread may differ depending on the infusion site or drug effects may differ due to differential expression of GABA-A receptors in the dorsal and intermediate hippocampus), and the authors do not provide direct evidence for this assumption. Therefore, a possible alternative account of the weaker effects of dorsal compared to intermediate hippocampal muscimol infusions on place-preference performance is that the dorsal infusions affect less hippocampal volume or less markedly inhibit neurons within the affected volume than the intermediate infusions. I would recommend that the authors briefly consider this issue in the discussion. Moreover, from the Methods, it is not clear which infusion volume and muscimol concentration were used for the different infusions (see below, 4.a.), and this must be clarified.

      We appreciate these insightful comments from the reviewer and agree that we do not provide direct evidence for the point raised by the reviewer. To the best of our knowledge, most of the behavioral studies on the long axis of the hippocampus did not particularly address the differential expression of GABA-A receptors along the axis. We could not find any literature that specifically introduced and compared the levels of expression of GABA-A receptors or the diffusion range of muscimol in the intermediate hippocampus to the other subregions. However, we found that Sotiriou et al. (2005) made such comparisons with respect to the expression of different GABA-A receptors. They concluded that the dorsal and ventral hippocampi have different levels of the GABA-A receptor subtypes. The a1/b2/g2 subtype was dominant in the dorsal hippocampus, while the a2/b1/g2 subtype was prevalent in the ventral hippocampus. Sotiriou and colleagues also mentioned the lower affinity of GABA-A receptor binding in the ventral hippocampus, and this result is consistent with the Papatheodoropoulos et al. (2002) study that showed a weaker synaptic inhibition in the ventral hippocampus compared to the dorsal hippocampus. Papatheodoropoulos et al. speculated differences in GABA receptors as one of the potential causes underlying the differential synaptic inhibition between the dorsal and ventral hippocampal regions. Based on these findings, the same volume of muscimol is more likely to cause a more severe effect on the ventral hippocampus than the dorsal hippocampus. Therefore, we do not believe that the less significant changes after the dorsal hippocampal inactivation were induced by the expression level of GABA-A receptors. Additionally, we have demonstrated in our previous study that muscimol injections in the dorsal hippocampus impair performance to the chance level in scene-based behavioral tasks (Lee et al., 2014; Kim et al., 2012).

      Nonetheless, we mentioned the possibility of differential muscimol expressions between the two target regions. Following the suggestion of the reviewer, we now included this information in the Discussion as follows:

      “Although there is still a possibility that the levels of expression of GABA-A receptors might be different along the longitudinal axis of the hippocampus, …” (p.20)

      Regarding the drug infusion volume and concentration, we included these details in the Methods. Please see our detailed response to 4.a. below.

      (3) It is good that the authors included a comparison/control study using a spherical beacon-guided navigation task, to examine the specific psychological mechanisms disrupted by the hippocampal manipulations. However, as outlined below (4.b.), the sample size for the comparison study was lower than for the main study, and the data in Figure 8 suggest that the comparison task may be affected by the hippocampal manipulations similarly to the place-preference task, albeit less markedly. This would raise the question as to which mechanisms that are common to the two tasks may be affected by hippocampal functional inhibition, which should be considered in the discussion.

      The sample size for the object-guided navigation task was smaller because we initially did not plan the experiment, but later in the study decided to conduct the control test. Therefore, the object-guided navigation task was added to the study design after finishing the first three rats, resulting in a smaller sample size than the place preference task. We included this detail in the manuscript, as follows:

      “Note the smaller sample size in the object-guided navigation task. This was because the task was later added to the study design.” (p.24)

      Regarding the mechanism behind the two different tasks, we did not perform the same heading direction analysis here as in the place preference task because the two tasks have different characteristics such as task complexity. The object-guided navigation task is somewhat similar to the visually guided (or cued) version of the water maze task, which is widely known as hippocampal-independent (Morris et al., 1986; Packard et al., 1989; also see our descriptions on p.15). Therefore, we would argue that the two tasks (i.e., place preference task and object-guided navigation task) used in the current manuscript do not share neural mechanisms in common. Additionally, we confirmed that several behavioral measurements related to motor capacity, such as travel distance and latency, along with the direct hit proportion provided in Figure 8, did not show any statistically significant changes across drug conditions.

      4. Several important methodological details require clarification:

      a. Drug infusions (from line 673):

      - '0.3 to 0.5 μl of either phosphate-buffered saline (PBS) or muscimol (MUS) was infused into each hemisphere'; the authors need to clarify when which infusion volume was used and why different infusion volumes were used.

      We thank the reviewer for carefully reading our manuscript. We were cautious about side effects, such as suppressed locomotion or overly aggressive behavior, since the iHP injection site was close to the ventricle. We were keenly aware that the intermediate to ventral hippocampal regions are sensitive to the drug dosage from our previous experiments. Thus, we observed the rat’s behavior for 20 minutes after drug injection in a clean cage. We started from 0.5 μl, based on our previous study, but if the injected rat showed any sign of side effects in the cage, we stopped the experiment for the day and tried with a lower dosage (i.e., 0.4 μl first, then 0.3 μl, etc.) until we found the right dosage under which the rat did not show any side effect. This procedure is necessary because cannula tip positions are slightly different from rat to rat. When undergoing this procedure, five out of eight rats received 0.4 μl, two received 0.3 μl, and one received 0.5 μl. Still, there was no significant difference in performance, including the high-value visit percentage, departing and perimeter crossing directions, across all dosages. This information is now added in the Methods section as follows:

      “If the rat showed any side effect, particularly sluggishness or aggression, we reduced the drug injection amount in the rat by 0.1 ml until we found the dosage with which there was no visible side effect. As a result, five of the rats received 0.4 ml, two received 0.3 ml, and one received 0.5 ml.” (p.25)

      - I could not find the concentration of the muscimol solution that was used. The authors must clarify this and also should include a justification of the doses used, e.g. based on previous studies.

      Thank you for the suggestion. We used the drug concentration of 1mg/ml, which was adapted from our previous muscimol study (Lee et al., 2014; Kim et al., 2012). The manuscript is now updated, as follows:

      “…or muscimol (MUS; 1mg/ml, dissolved in saline) was infused into each hemisphere via a 33-gauge injection cannula at an injection speed of 0.167 ml/min, based on our previous study (Lee et al., 2014; Kim et al., 2012).” (p.25)

      -  Please also clarify if the injectors and dummies were flush with the guides or by which distance they protruded from the guides.

      The injection and dummy cannula both protruded from the guide cannula by 1 mm, and this information is now added to the Methods section, as follows:

      “The injection cannula and dummy cannula extended 1 mm below the tip of the guide cannula.” (p.25)

      b. Sample sizes: The authors should include sample size justifications, e.g. based on considerations of statistical power, previous studies, practical considerations, or a combination of these factors. Importantly, the smaller sample size in the control study using the spherical beacon-guided navigation task (n=5 rats) limits comparability with the main study using the place-preference task (n=8). Numerically, the findings on the control task (Figure 8) look quite similar to the findings on the place-preference task, with intermediate hippocampal muscimol infusions causing the most pronounced impairment and dorsal hippocampal muscimol infusions causing a weaker impairment. These effects may have reached statistical significance if the same sample size had been used in the place-preference study.

      We set the current sample size for several reasons. First, based on our previous studies, we assumed that eight, or more than six, would be enough to achieve statistical power in a “within-animal design” study. Also, considering the ethical commitments, we tried to keep the number of animals used in the study to the least. Last, our paradigm required very long training periods (3 months on average per animal), so we could not increase the sample size for practical reasons. Regarding the reasons for the smaller sample size for the object-guided navigation task, please see the previous response to 3 above. The manuscript is now revised as follows:

      “Based on our prior studies (Park et al., 2017; Yoo and Lee, 2017; Lee et al., 2014), the sample size of our study was set to the least number to achieve the necessary statistical power in the current within-subject study design for ethical commitments and practical considerations (i.e., relatively long training periods).” (p.22)

      c. Statistical analyses: Why were the data of the intermediate and dorsal hippocampal PBS infusion conditions averaged for some of the analyses (Figure 5; Figure 6B and C; Figure 7B and C; Figure 8B) but not for others (Figure 6A and Figure 7A)?

      The reviewer is correct that we only illustrated the separate dPBS and iPBS data for Figures 6A and 7A. Since the directional analysis is the main focus of the current manuscript, we tried to provide better visualization and more detailed examples of how the drug infusion changed the behavioral patterns between the PBS and MUS conditions in each region. Except for the visualization of DD and PCD, we averaged the PBS sessions to increase statistical power, as described in p.9. We added a detailed description of the reasons for illustrating dPBS and iPBS data separately in the manuscript, as follows:

      “Note that dPBS and iPBS sessions were separately illustrated here for better visualization of changes in the behavioral pattern for each subregion.” (p.12)

      Reviewing Editor (Recommendations For The Authors):

      The strength of evidence rating in the assessment is currently noted as "incomplete." This can be improved following revisions if you amend your conclusions in the paper, including in the title and abstract, such that the paper's major conclusions more closely match what is shown in the Results.

      Following the suggestions of the reviewing editor, we have mentioned the caveats of our study in the Limitations section of our revised manuscript (p.21). In addition, the manuscript has been revised so that the conclusions in the paper match more closely to the experimental results as can been seen in some of the relevant sentences in the abstract and main text as follows:

      “Inactivation of both dHP and iHP with muscimol altered efficiency and precision of wayfinding behavior, but iHP inactivation induced more severe damage, including impaired place preference. Our findings suggest that the iHP is more critical for value-dependent navigation toward higher-value goal locations.” (Abstract; p.2)

      “Whereas inactivation of the dHP mainly affected the precision of wayfinding, iHP inactivation impaired value-dependent navigation more severely by affecting place preference.” (p.5)

      “The iHP causes more damage to value-dependent spatial navigation than the dHP, which is important for navigational precision” (p.12)

      However, we haven’t changed the title of the manuscript as it carries what we’d like to deliver in this study accurately.

      Reviewer #1 (Recommendations For The Authors):

      - What were the dimensions of the environment? What distance did rats typically run to reach the reward zone? A scale bar would be helpful in Figure 1.

      We used the same circular arena from the shaping session, which was 1.6 meters in diameter (p.23), and the shortest path between the start location and either reward zone was 0.62 meters. We revised the manuscript for clarification as follows:

      “For the pre-training session, rats were required to find hidden reward zones…, on the same circular arena from the shaping session.” (p.23)

      “Therefore, the shortest path length between the start position and the reward zone was 0.62 meters.” (p.23)

      We also added a scale bar in Figure 1C for a better understanding.

      - Line 169: "The scene rotation plot covers the period from the start of the trial to when the rat leaves the starting point at the center and the departure circle (Figure 2B)."

      The sentence is unclear. Maybe it should be "... from the start of the trial to when the rat leaves the departure circle”.

      The sentence has been revised following the reviewer's suggestion. (p.7)

      - Line 147: "First, they learned to rotate the spherical treadmill counterclockwise to move around in the virtual environment (presumably to perform energy-efficient navigation)."

      It is not clear from this sentence if rats naturally preferred the counterclockwise direction or if the counterclockwise direction was a task requirement.

      We now clarified in our revised manuscript that it was not a task requirement to turn counterclockwise, as follows:

      “First, although it was not required in the task, they learned to rotate the spherical treadmill counterclockwise…” (p.6)

      - Line 149: "Second, once a trial started, but before leaving the starting point at the center, the animal rotated the treadmill to turn the virtual environment immediately to align its starting direction with the visual scene associated with the high-value reward zone."

      The sentence is unclear. Maybe "Second, once a trial started, the animal rotated the treadmill immediately to align its starting direction with the visual scene associated with the high-value reward zone.”

      We have updated the description following the suggestion. (p.6)

      Reviewer #2 (Recommendations For The Authors):

      - There are some misleading descriptions of the conclusion of the results in this paper. In this study, the functions of (a) selection of high-value target and (b) spatial navigation to the target were assessed in the behavioral experiments. The results of the pharmacological experiments showed that dHP inactivation impaired (b) and iHP inactivation impaired both (a) and (b) (Figures 5 B & D). However, the last sentence of the abstract states that dHP is important for the functions of (a) and iHP for (b). There are several other similar statements in the main text. Since the separation of (a) and (b) is an important and original aspect of this study, the description should clearly show the conclusion that dHP is important for (a) and iHP is important for both (a) and (b).

      Related to the above, the paragraph title in the Discussion "The iHP may contain a value-associated cognitive map with reasonable spatial resolution for goal-directed navigation (536-537)" is also somewhat misleading: "with reasonable resolution for goal-directed behavior" seems to reflect the results of an object-guided navigation task (Figure 8). However, the term "goal-directed behavior" is also used for value-dependent spatial navigation (i.e., the main task), which causes confusion. I would like to suggest clarifying the wording on this point.

      First, we need to correct the reviewer’s statement regarding our descriptions of the results. As the reviewer mentioned, our results indicated that the dHP inactivation impaired (b) but not (a), while the iHP inactivation impaired both (a) and (b). Regarding the iHP inactivation result, we focused on the impairment of (a) since our aim was to investigate spatial-value association in the hippocampus. Also, it was more likely that (a) affected (b), but not the other way, because (a) remained intact when (b) was impaired after dHP inactivation. We emphasized this difference between dHP and iHP inactivation, which was (a). Therefore, we mentioned in the last sentence of the abstract that the dHP is important for (b), which is the precision of spatial navigation to the target location, and the iHP is critical for (a).

      Moreover, we would like to clarify that we were not referring to the object-guided navigation task in Figure 8 in the phrase ‘with a reasonable spatial resolution for goal-directed navigation.’ Please note that the object-guided navigation task did not require fine spatial resolution to find the reward. The phrase instead referred to the dHP inactivation result (Figure 5 and 6), where the rats could find the high-value zone even with dHP inactivation, although the navigational precision decreased. Nonetheless, we agree with the reviewer for the confusion that the title might cause, so now have updated the title as follows:

      “The iHP may contain a value-associated cognitive map with reasonable spatial resolution for value-based navigation” (p.19)

      - As an earlier study focusing on the physiology of iHP, Maurer et al, Hippocampus 15:841 (2005) is also a pioneering and important study, and I suggest citing it.

      Thank you for the suggestion. We included the Maurer et al. (2005) study in the Introduction section as follows:

      “…Specifically, there is physiological evidence that the size of a place field becomes larger as recordings of place cells move from the dHP to the vHP (Jung et al., 1994; Maurer et al., 2005; Kjelstrup et al., 2008; Royer et al., 2010).” (p.4)

      - One of the strengths of this paper is that we have developed a new control system for the VR navigation task device, but I cannot get a very detailed description of this system in the Methods section. Also, no information about the system control has been uploaded to GitHub. I would suggest adding a description of the manufacturer, model number, and size of components, such as a rotary encoder and ball, and information about the software of the control system, with enough detail to allow the reader to reconstruct the system.

      We have now added detailed descriptions of the VR system in the Methods section (see “2D VR system). (p.22)

      Reviewer #3 (Recommendations For The Authors):

      (1) Some comments on specific passages of text:

      Lines 87 to 89: 'Surprisingly, beyond the recognition of anatomical divisions, little is known about the functional differentiation of subregions along the dorsoventral axis of the hippocampus. Moreover, the available literature on the subject is somewhat inconsistent.'

      I would recommend to rephrase these statements. Regarding the first statement, there is substantial evidence for functional differentiation along the dorso-ventral axis of the hippocampus (e.g., see reviews by Moser and Moser, 1998, Hippocampus; Bannerman et al., 2004, Neurosci Biobehav Rev; Bast, 2007, Rev Neurosci; Bast, 2011, Curr Opin Neurobiol; Fanselow and Dong, 2010, Neuron; Strange et al., 2014, Nature Rev Neurosci). Regarding the second statement, the authors may consider being more specific, as the inconsistencies demonstrated seem to relate mainly to the hippocampal representation of value information, instead of functional differentiation along the dorso-ventral hippocampal axis in general.

      We agree with the reviewer that the abovementioned statements need further clarification. The manuscript is now revised as follows:

      “Surprisingly, beyond the recognition of anatomical divisions, the available literature on the functional differentiation of subregions along the dorsoventral axis of the hippocampus, particularly in the context of value representation, is somewhat inconsistent.” (p.4)

      Lines 92 to 93: 'Thus, it has been thought that the dHP is more specialized for precise spatial representation than the iHP and vHP.'

      I think 'fine-grained' may be the more appropriate term here. Also, check throughout the manuscript when referring to the differences of spatial representations along the hippocampal dorso-ventral axis.

      Thank you for the insightful suggestion. We changed the term to ‘fine-grained’ throughout the manuscript, as follows:

      “Thus, it has been thought that the dHP is more specialized for fine-grained spatial representation than the iHP and vHP.” (p.4)

      “Consequently, the fine-grained spatial map present in the dHP…” (p.20)

      Line 217: well-'trained' rats?

      We initially used the term ‘well-learned’ to focus on the effect of learning, not training. Please note that the rats were already adapted to moving freely in the VR environment during the Shaping sessions, but the immediate counterclockwise body alignment only appeared after they acquired the reward locations for the main task. Nonetheless, we agree that the term might cause confusion, so we revised the manuscript as the reviewer suggested, as follows:

      “This implies that well-trained rats aligned their bodies more efficiently…” (p.8)

      Lines 309 to 311: 'Taken together, these results indicate that iHP inactivation severely damages normal goal-directed navigational patterns in our place preference task.'

      Consider to mention that dHP inactivation also causes impairments, albeit weaker ones.

      We thank the reviewer for the suggestion. We revised the manuscript by mentioning dHP inactivation as follows:

      “Taken together, these results indicate that iHP inactivation more severely damages normal goal-directed navigational patterns than dHP inactivation in our place-preference task.” (p.11-12)

      Lines 550 to 552: 'The involvement of the iHP in spatial value association has been reported in several studies. For example, Bast and colleagues reported that rapid place learning is disrupted by removing the iHP and vHP, even when the dHP remains undamaged (Bast et al., 2009).'

      Bast et al. (2009) did not directly show the role of iHP in 'spatial value associations'. They suggested that the importance of iHP for behavioral performance based on rapid, one-trial, place learning may reflect neuroanatomical features of the intermediate region, especially the combination of afferents that could convey the required fine-grained visuo-spatial information with relevant afferent and efferent connections that may be important to translate hippocampal place memory into appropriate behavioral performance (this may include afferents conveying value information). More recent theoretical and empirical research suggests that projections to the (ventral) striatum may be relevant (see Tessereau et al., 2021, BNA and Bauer et al., 2021, BNA).

      We appreciate the reviewer for this insightful comment. We agree with the reviewer that Bast et al. (2009) did not directly mention spatial value association; however, learning a new platform location needs an update of value information in the spatial environment. Therefore, we thought the study, though indirectly, suggested how the iHP contributes to spatial value associations. Nonetheless, to avoid confusion, we revised the manuscript, as follows:

      “The involvement of the iHP in spatial value association has been reported or implicated in several studies” (p.20)

      (2) Figures and legends:

      Figure 2B: What do the numbers after novice and expert indicate?

      The numbers indicate the rat ID, followed by the session number. We added the details to the Figure legend, as follows:

      “The numbers after ‘Novice’ and ‘Expert’ indicate the rat and session number of the example.” (p.34)

      Figure 2C: Please indicate units of the travel distance and latency measurements.

      The units are now described in the Figure legends, as follows:

      “Mean travel distance in meters and latency in seconds are shown below the VR arena trajectory.” (p.34)

      Figure 3Aii: Here and in other figures - do the vector lengths have a unit (degree?)?

      No, the mean vector length is an averaged value of the resultant vectors, thus having no specific unit.

      Figure 5A: Please explain what the numbers on top of the individual sample trajectories indicate.

      The numbers are IDs for rats, sessions, and trials of specific examples. We added the explanation to the Figure legends, as follows:

      “Numbers above each trajectory indicate the identification numbers for rat, session, and trial.” (p.35)

      (3) Additional comments on some methodological details:

      a. Why was the non-parametric Wilcoxon signed-rank test used for the planned comparison between intermediate and dorsal hippocampal PBS infusions, whereas parametric ANOVA and post-hoc comparisons were used for other analyses? This probably doesn't make a big difference for the interpretation of the present data (as a parametric pairwise comparison would also not have revealed any significant difference between intermediate and dorsal hippocampal PBS infusions), but it would nevertheless be good to clarify the rationale for this.

      We used the non-parametric statistics since our sample size was rather small (n=8) to use the parametric statistics, although we used the parametric ANOVA for some of the results because it is the most commonly known and widely used statistical test in such comparisons. However, we also checked the statistics with the alternatives (i.e., non-parametric Wilcoxon signed-rank test to parametric paired t-test and parametric One-way RM ANOVA with Bonferroni post hoc test to non-parametric Friedman’s test with Dunn’s post hoc test), and the statistical significance did not change with any of the tests. We now added the explanation in the manuscript, as follows:

      “Although most of our statistics were based on the non-parametric tests for the relatively small sample size (n=8), we used the parametric RM ANOVA for comparing three groups (i.e., PBS, dMUS, and iMUS) because it is the most commonly known and widely used statistical test in such comparison. However, we also performed statistical tests with the alternatives for reference, and the statistical significances were not changed with any of the results.” (p.26)

      b. Single housing of rats:

      Why was this chosen? Based on my experience, this is not necessary for studies involving cannula implants and food restriction. Group housing is generally considered to improve the welfare of rats.

      We chose single housing of rats because our training paradigm required precise restrictions on the food consumption of individual rats, which could be difficult in group housing.

      c. Anesthesia:

      Why was pentobarbital used, alongside isoflurane, to anesthetize rats for surgery (line 663)? The use of gaseous anesthesia alone offers very good control of anesthesia and reduces the risk of death from anesthesia compared to the use of pentobarbital.

      Why was anesthesia used for the drug infusions (line 674)? If rats are well-habituated to handling by the experimenter, manual restraint is sufficient for intra-cerebral infusions. Therefore, anesthesia could be omitted, reducing the risk of adverse effects on the experimental rats.

      I do not think that points b. and c. are relevant for the interpretation of the present findings, but the authors may consider these points for future studies to improve further the welfare of the experimental rats.

      We appreciate the reviewer’s careful suggestions. For both the use of pentobarbital during surgery and anesthesia for the drug infusion, we chose to do so to avoid any risk of rats being awake and becoming anxious and to ensure safety during the procedures. They might not be necessary, but they were helpful for the experimenters to proceed with sufficient time to maintain precision. Nonetheless, we agree with the reviewer’s concern, which was the reason why we monitored the rats’ behavior for 20 minutes in the cage after drug infusion to minimize any potential influence on the task performance. We updated the relevant details in the Methods section, as follows:

      “The rat was kept in a clean cage to recover from anesthesia completely and monitored for side effects for 20 minutes, then was moved to the VR apparatus for behavioral testing.” (p.25)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript aims at a quantitative model of how visual stimuli, given as time-dependent light intensity signals, are transduced into electrical currents in photoreceptors of macaque and mouse retina. Based on prior knowledge of the fundamental biophysical steps of the transduction cascade and a relatively small number of free parameters, the resulting model is found to fairly accurately capture measured photoreceptor currents under a range of diverse visual stimuli and with parameters that are (mostly) identical for photoreceptors of the same type.

      Furthermore, as the model is invertible, the authors show that it can be used to derive visual stimuli that result in a desired, predetermined photoreceptor response. As demonstrated with several examples, this can be used to probe how the dynamics of phototransduction affect downstream signals in retinal ganglion cells, for example, by manipulating the visual stimuli in such a way that photoreceptor signals are linear or have reduced or altered adaptation. This innovative approach had already previously been used by the same lab to probe the contribution of photoreceptor adaptation to differences between On and Off parasol cells (Yu et al, eLife 2022), but the present paper extends this by describing and testing the photoreceptor model more generally and in both macaque and mouse as well as for both rods and cones.

      Strengths:

      The presentation of the model is thorough and convincing, and the ability to capture responses to stimuli as different as white noise with varying mean intensity and flashes with a common set of model parameters across cells is impressive. Also, the suggested approach of applying the model to modify visual stimuli that effectively alter photoreceptor signal processing is thought-provoking and should be a powerful tool for future investigations of retinal circuit function. The examples of how this approach can be applied are convincing and corroborate, for example, previous findings that adaptation to ambient light in the primate retina, as measured by responses to light flashes, mostly originates in photoreceptors.

      Weaknesses:

      In the current form of the presentation, it doesn't become fully clear how easily the approach is applicable at different mean light levels and where exactly the limits for the model inversion are at high frequency. Also, accessibility and applicability by others could be strengthened by including more details about how parameters are fixed and what consensus values are selected.

      Thank you - indeed a central goal of writing this paper was to provide a tool that could be easily used by other laboratories. We have clarified and expanded four points in this regard: (1) we have stated more clearly that mean light levels are naturally part of inversion process, and hence the approach can be applied across a broad range of light levels (lines 292-297); (2) we have expanded our analysis of the high frequency limits to the inversion and added that expanded figure to the main text (new Fig 5); (3) we have included additional detail about our calibration procedures, including our calibration code, to facilitate transfer to other labs; and, (4) we have detailed the procedure for identification of consensus parameters (line 172-182, 191-199 and Methods section starting on line 831).

      Reviewer #2 (Public Review):

      Summary:

      This manuscript proposes a modeling approach to capture nonlinear processes of photocurrents in mammalian (mouse, primate) rod and cone photoreceptors. The ultimate goal is to separate these nonlinearities at the level of photocurrent from subsequent nonlinear processing that occurs in retinal circuitry. The authors devised a strategy to generate stimuli that cancel the major nonlinearities in photocurrents. For example, modified stimuli would generate genuine sinusoidal modulation of the photocurrent, whereas a sinusoidal stimulus would not (i.e., because of asymmetries in the photocurrent to light vs. dark changes); and modified stimuli that could cancel the effects of light adaptation at the photocurrent level. Using these modified stimuli, one could record downstream neurons, knowing that any nonlinearities that emerge must happen post-photocurrent. This could be a useful method for separating nonlinear mechanisms across different stages of retinal processing, although there are some apparent limitations to the overall strategy.

      Strengths:

      (1) This is a very quantitative and thoughtful approach and addresses a long-standing problem in the field: determining the location of nonlinearities within a complex circuit, including asymmetric responses to different polarities of contrast, adaptation, etc.

      (2) The study presents data for two primary models of mammalian retina, mouse, and primate, and shows that the basic strategy works in each case.

      (3) Ideally, the present results would generalize to the work in other labs and possibly other sensory systems. How easy would this be? Would one lab have to be able to record both receptor and post-receptor neurons? Would in vitro recordings be useful for interpreting in vivo studies? It would be useful to comment on how well the current strategy could be generalized.

      We agree that generalization to work in other laboratories is important, and indeed that was a motivation for writing this as a methods paper. The key issue in such generalization is calibration. We have expanded our discussion of our calibration procedures and included that code as part of the github repository associated with the paper. Figure 10 (previously Figure 9) was added to illustrate generalization. We believe that the approach we introduce here should generalize to in vivo conditions. We have expanded the text on these issues in the Discussion (sections starting on line 689 and 757).

      Weaknesses:

      (1) The model is limited to describing photoreceptor responses at the level of photocurrents, as opposed to the output of the cell, which takes into account voltage-dependent mechanisms, horizontal cell feedback, etc., as the authors acknowledge. How would one distinguish nonlinearities that emerge at the level of post-photocurrent processing within the photoreceptor as opposed to downstream mechanisms? It would seem as if one is back to the earlier approach, recording at multiple levels of the circuit (e.g., Dunn et al., 2006, 2007).

      Indeed the current model is limited to a description of rod and cone photocurrents. Nonetheless, the transformation of light inputs to photocurrents can be strongly nonlinear, and such nonlinearities can be difficult to untangle from those occurring late in visual processing. Hence, we feel that the ability to capture and manipulate nonlinearities in the photocurrents is an important step. We have expanded Figure 10 to show an additional example of how manipulation of nonlinearities in phototransduction can give insight into downstream responses. We have also noted in text that an important next step would be to include inner segment mechanisms (section starting on line 661); doing so will require not only characterization of the current-to-voltage transformation, but also horizontal cell feedback and properties of the cone output synapse.

      (2) It would have been nice to see additional confirmations of the approach beyond what is presented in Figure 9. This is limited by the sample (n = 1 horizontal cell) and the number of conditions (1). It would have been interesting to at least see the same test at a dimmer light level, where the major adaptation mechanisms are supposed to occur beyond the photoreceptors (Dunn et al., 2007).

      We have added an additional experiment to this figure (now Figure 10) which we feel nicely exemplifies the approach. The approach that we introduce here really only makes sense at light levels where the photoreceptors are adapting; at lower light levels the photoreceptors respond near-linearly, so our “modified” and “original” stimuli as in Figure 10 (previously Figure 9) would be very similar (and post-phototransduction nonlinearities are naturally isolated at these light levels).

      Reviewer #3 (Public Review):

      Summary:

      The authors propose to invert a mechanistic model of phototransduction in mouse and rod photoreceptors to derive stimuli that compensate for nonlinearities in these cells. They fit the model to a large set of photoreceptor recordings and show in additional data that the compensation works. This can allow the exclusion of photoreceptors as a source of nonlinear computation in the retina, as desired to pinpoint nonlinearities in retinal computation. Overall, the recordings made by the authors are impressive and I appreciate the simplicity and elegance of the idea. The data support the authors' conclusions but the presentation can be improved.

      Strengths:

      -  The authors collected an impressive set of recordings from mouse and primate photoreceptors, which is very challenging to obtain.

      -  The authors propose to exploit mechanistic mathematical models of well-understood phototransduction to design light stimuli that compensate for nonlinearities.

      -  The authors demonstrate through additional experiments that their proposed approach works.

      Weaknesses:

      -  The authors use numerical optimization for fitting the parameters of the photoreceptor model to the data. Recently, the field of simulation-based inference has developed methods to do so, including quantification of the uncertainty of the resulting estimates. Since the authors state that two different procedures were used due to the different amounts of data collected from different cells, it may be worthwhile to rather test these methods, as implemented e.g. in the SBI toolbox (https://joss.theoj.org/papers/10.21105/joss.02505). This would also allow them to directly identify dependencies between parameters, and obtain associated uncertainty estimates. This would also make the discussion of how well constrained the parameters are by the data or how much they vary more principled because the SBI uncertainty estimates could be used.

      Thank you - we have improved how we describe and report parameter values in several ways. First, the previous text erroneously stated that we used different fitting procedures for different cell types - but the real difference was in the amount of data and range of stimuli we had available between rods and cones. The fitting procedure itself was the same for all cell types. We have clarified this along with other details of the model fitting both in the main text (lines 121-130) and in the Methods (section starting on line 832). We also collected parameter values and estimates of allowed ranges in two tables. Finally, we used sloppy modeling to identify parameters that could covary with relatively small impact on model performance; we added a description of this analysis to the Methods (section starting on line 903).

      -  In several places, the authors refer the reader to look up specific values e.g. of parameters in the associated MATLAB code. I don't think this is appropriate, important values/findings/facts should be in the paper (lines 142, 114, 168). I would even find the precise values that the authors measure interesting, so I think the authors should show them in a figure/table. In general, I would like to see also the average variance explained by different models summarized in a table and precise mean/median values for all important quantities (like the response amplitude ratios in Figures 6/9).

      We have added two tables with these parameters values and estimates of allowable ranges. We also added points to show the mean (and SD) across cells to the population figures and added those numerical values to the figure legends throughout.

      -  If the proposed model is supposed to model photoreceptor adaptation on a longer time scale, I fail to see why this can be an invertible model. Could the authors explain this better? I suspect that the model is mainly about nonlinearities as the authors also discuss in lines 360ff.

      For the stimuli that we use we see little or no contribution of slow adaptation in phototransduction. We have expanded the description of this point in the text and referred to Angueyra et al (2022) which looks at this issue in more detail for primate cones (paragraph starting on line 280).

      -  The important Figures 6-8 are very hard to read, as it is not easy to see what the stimulus is, the modified stimulus, the response with and without modification, what the desired output looks like, and what is measured for part B. Reworking these figures would be highly recommended.

      We have reworked all of the figures to make the traces clearer.

      -  If I understand Figure 6 correctly, part B is about quantifying the relative size of the response to the little first flash to the little second flash. While clearly, the response amplitude of the second flash is only 50% for the second flash compared to the first flash in primate rod and cones in the original condition, the modified stimulus seems to overcompensate and result in 130% response for the second flash. How do the authors explain this? A similar effect occurs in Figure 9, which the authors should also discuss.

      Indeed, in those instances the modified stimulus does appear to overcompensate. We suspect this is due to differences in sensitivity of the specific cells probed for these experiments and those used in the model construction. We now describe this limitation in more detail (lines 524-526). A similar point comes up for those experiments in which we speed the photoreceptor responses (new FIgure 9B), and we similarly note that the cells used to test those manipulations differed systematically from those used to fit the model (lines 558-560).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I only have a few minor questions and suggestions for clarification.

      It hasn't become fully clear to me how general the model is when different mean light levels (on long-time scales) are considered. Are there slow adaptation processes not captured in the model that affect model performance? And how should one go about setting the mean light level when, for example, probing ganglion cells with a stimulus obtained through model inversion? Should it work to add an appropriate DC component to the current that is provided as input to the inverted model? (Presumably, deriving a stimulus and then just adding background illumination should not work, or could this be a good approximation, given a steady state that is adapted to the background?)

      We have clarified in the main text that slow adaptation does not contribute substantially to responses to the range of stimuli we explored (lines 281-289). We have also clarified that the stimulus in the model inversion is specified in isomerizations per second - so the mean value of the stimulus is automatically included in the model inversion (lines 293-298).

      Furthermore, a caveat for the model inversion seems to be the potential amplification of high-frequency noise. The suggested application of a cutoff temporal frequency seems appropriate, but data are shown only for a few example cells. Is this consistent across cells? (Given that performance between, e.g., mouse cones can vary considerably according to Fig. 4B?) I would also like to suggest moving the corresponding Supplemental Figure (4.1) into the main part of the manuscript, as it seems quite important.

      We have added population analysis to the new Figure 5 (which was Figure 4 - Figure Supplement 1). We have also clarified that the amplification of high frequency noise is an issue only when we try to apply model inversion to measured stimuli. When we use model inversion to identify stimuli that elicit desired responses, the target responses are computed from a linear model that has no noise, so this is not a concern in applications like those in Figures 6-10.

      Also, could the authors explain more clearly what the effect of the normalization of the estimated stimulus by the power of the true stimulus is? Does this simply reduce power at high frequency or also affect frequencies below the suggested cutoff (where the stimulus reconstruction should presumably be accurate even without normalization)?

      Indeed this normalization reduces high frequency power and has little impact on low frequencies where the inversion is accurate; this is now noted in the text (line 363). As for amplification of high frequency noise (previous comment), the normalization by the stimulus power is only needed when inverting measured responses (i.e. responses with noise) and is omitted when we are identifying stimuli that elicit desired responses (e.g. in Figures 6-10).

      While the overall performance of the model to predict photoreceptor currents is impressive, it seems that particular misses occur for flashes right after a step in background illumination and for the white-noise responses at low background illumination (e.g. Figure 1B). Is that systematic, and if so what might be missing in the model?

      Indeed the model (at least with fixed parameters across stimuli) appears to systematically miss a few aspects of the photoreceptor responses. These include the latency of the response to a bright flash and the early flashes in the step + flash protocol in Figure 1B. Model errors for the variable mean noise stimulus (Figure 2) showed little dependence on time even when responses were sorted by mean light level and by previous mean level. Model errors did not show a clear systematic dependence on light level; this likely reflects, at least in part, the use of mean-square-error to identify model parameters. We have expanded our discussion of these systematic errors in the text (lines 164-166).

      I was also wondering whether this is related to the fact that in Figure 9B, the gain in the modified condition is actually systematically higher when there is more background light. Do the authors think that this could be a real effect or rather an overcompensation from the model? (By the way, is it specified what "Delta-gain" really is, i.e., ratio or normalized difference?)

      We suspect this is an issue with the sensitivity of the specific cells for which we did these experiments (i.e. variability in the gamma parameter between cells). This sensitivity varies between cells, and such variations are likely to place the strongest limitation on our ability to use this approach to manipulate responses in different retinas. We now note those issues in the Results (lines 523-526, 557-559 and 591-593) with reference to Figures 9 (previously Figure 8) and 10 (previously Figure 9), and describe this limitation more generally in the Discussion (section starting on line 649). We have also changed delta-gain to response ratio, which seemed more intuitive.

      Maybe I missed this, but it seems that the parameter gamma is fitted in a cell-type-specific fashion (e.g. line 163), but then needs to be fixed for held-out cells. How was this done? Is there much variability of gamma between cells?

      There is variability in gamma between cells, and this likely explains some of systematic differences between data and model (see above and Methods, lines 902-903). For the consensus models in Figure 2B, gamma was allowed to vary for each cell while the remaining consensus model parameters were fixed. Gamma was set equal to the mean value across cells for model inversion (i.e. for all of the analyses in Figures 4-10). We have described the fitting procedure in considerably more detail in the revised Methods (starting on line 832).

      For completeness, it would be nice to have the applied consensus model parameters in the manuscript rather than just in the Matlab code (especially since the code has not been part of the submission). Also, some notes on how the numerical integration of the differential equations was done would be nice (time step size?).

      We have added tables with consensus parameters and estimates of the sensitivity of model predictions to each parameter. We have also added additional details about the numerical approaches (including the time step) to Methods.

      Similarly, it would be nice to explicitly see the relationships that are used to fix certain model parameters (lines 705ff). And can the constants k and n (lines 709-710) be assumed identical for different species and receptor types?

      We have added more details to the model fitting to the methods, including the use of steady-state conditions to hold certain parameters fixed (lines 862 and 866). We are not aware of any direct comparisons of k and n across species and receptor types. We have noted that model performance was not improved by modest changes in these parameters (due to compensation by other model parameters). More generally, we have explained how some parameters trade for others and hence the logic of fixing some even when exact values were not available.

      For the previous measurements of m and beta (lines 712-713), is there a reference or source?

      We have added references for these values.

      Did the authors check for differences in the model parameters between cone types (e.g., S vs. M)?

      We did not include S cones here. They are harder to record from and collecting a fairly large data set across a range of stimuli would be challenging. Our previous work shows that S cones have slower responses than L and M cones, and this would certainly be reflected in differences in model parameters. We have noted this in the text (Methods, line 808-810).

      For the stated flash responses time-to-peak (lines 183-184), is this for a particular light intensity with no background illumination?

      Those are flashes from darkness - now noted in the text.

      Figure 2 - Supplement 1 doesn't have panel labels A and B, unlike the legend.

      Fixed - thank you.

      Reviewer #2 (Recommendations For The Authors):

      (1) Fig. 2B - for some cells, the consensus model seems to fit better than the individual model. How is this possible?

      This was mostly an error on our part (we inadvertently included responses to more stimuli in fitting the individual models, which slightly hampered their performance). Even with this correction, however, a few cells remain for which the consensus model outperforms and individual model. We believe this is because there is more data to constrain model parameters for the consensus models (since they are fit to all cells at the same time), and that can compensate for improvements associated with customizing parameters to specific cells.

      (2) Fig. 2 Supplement 1, it would be useful to see a blow-up of the data in an inset, as in Fig. 2B.

      Thanks - added.

      (3) Line 400 - this paragraph could include additional quantification and statistics to back up claims re 'substantially reduced', 'considerably lower'.

      We quantify that in the next sentence by computing the mean-square-error between responses and sinusoidal fits (also in Figure 7B, which now includes statistics as well). We have made that connection more direct in the text.

      (4) Maybe a supplement to Fig. 8 could show the changes to the stimulus required to alter the kinetics in both directions - to give more insight into part B., especially.

      Good suggestion - we have added the stimuli to all of the panels of the figure (now Figure 9).

      (5) Fig. 8B - in 'Speed response up' condition - there seems to be error in the model for the decay time of the response - especially for the 'original' condition, which is not quantified in 8C. Was it generally difficult to predict responses to flashes?

      That seems largely to reflect that the cells used for those experiments had faster initial kinetics than the average cells (responses to the control traces are also faster than model predictions in these cells - black traces in Figure 9B). We have added this to the text.

      (6) Line 678, possibly notes that 405 nm equally activates S and M photopigments in mice, since most of the cones co-express the two photopigments (Rohlich et al., 1994; Applebury et al., 2000; Wang et al., 2011).

      Thanks - we have added this (lines 827-829).

      (7) The discussion could include a broader description of the various approaches to identifying nonlinearities within retinal circuitry, which include (incomplete list): recording at multiple levels of the circuit (e.g., Kim and Rieke 2001; Rieke, 2001; Baccus and Meister, 2002; Dunn et al., 2006; 2007; Beaudoin et al., 2007; Baccus et al., 2008); recording currents vs. spiking responses in a ganglion cell (e.g., Kim and Rieke, 2001; Zaghloul et al., 2005; Cui et al., 2016); neural network modeling approaches (e.g., Maheswaranathan et al., 2023); optogenetic approaches to studying filtering/nonlinear behavior at synapses (e.g., Pottackal et al., 2020; 2021).

      Good suggestion - we have added this to the final paragraph of the Discussion.

      Reviewer #3 (Recommendations For The Authors):

      -  I am personally not a fan of the style: "... as Figure 4A shows..." or comparable and much prefer a direct "We observe that X is the case (Figure 4A)". If the authors agree, they may want to revise their paper in this way.

      We have revised the text to avoid the “... as Figure xx shows” construction. We have retained multiple instances which follow a “Figure xx shows that …” construction (which is both active rather than passive and does not use a personal pronoun).

      -  I am not a fan of the title. Light-adaption clamp caters only to a very specialized audience.

      We have changed the title to “Predictably manipulating photoreceptor light responses to reveal their role in downstream visual responses.”

      -  The parameter fitting procedure should not only be described in Matlab code, but in the paper.

      Thanks - we have expanded this in the Methods considerably (section starting on line 832).

      -  The authors should elaborate on why different fitting procedures were used.

      We did not describe that issue clearly. The fitting procedures used across cells were identical, but we had different data available for different cell types due to experimental limitations. We have substantially revised that part of the main text to clarify this issue (paragraph starting on line 121).

      -  The authors state in line 126 that the input stimulus is supposed to mimic eye movements mouse, monkey, or human? Please clarify.

      Thanks - we have changed this sentence to “abrupt and frequent changes in intensity that characterize natural vision.”

      -  Please improve the figure style. For example, labels should be in consistent capitalization and ideally use complete words (e.g. Figure 2B, 4B, and others).

      We have made numerous small changes in the figures to make them more consistent.

      -  Is the fraction of variance calculated on held-out-data? Linear models should be added to Figure 2B.

      The fraction of variance explained was not calculated on held out data because of limitations in the duration of our recordings. Given the small number of free parameters, and the ability of the model to capture held out cells, we believe that the model generalizes well. We have added a supplemental figure with linear model performance (Figure 2 - Figure Supplement 2).

      -  Fig. 9A is lacking bipolar cell and amacrine cell labels. Currently, it looks like HC is next to the BC in the schematic.

      Thanks - we have updated that figure (now Figure 10A)

      -  Maybe I am misunderstanding something, but it seems like the linear model prediction shown in Figure 2A for the rod could be easily improved by scaling it appropriately. Is this impression correct or why not?

      We have clarified how the linear model is constructed (by fitting the linear model to low contrast responses of the full model at the mean stimulus intensity). We also added a supplemental figure, following the suggestion above, showing the linear model performance when a free scaling factor is included for each cell.

      -  The verification experiment in Fig. 5 is only anecdotal and is elaborated only in Figure 6. If I am not mistaken, this does not necessitate its own figure/section but could rather be merged.

      We have kept this figure separate (now Figure 6) as we felt that it was important to highlight the approach in general in a figure before getting into quantification of how well it works.

      -  Figure 5 right is lacking labels. What is red and grey?

      Thanks for catching that - labels are added now.

      -  The end of the Discussion is slightly unusual. Did some text go missing?

      Thanks - we have rearranged the Discussion so as not to end on Limitations.

      -  There is a bonus figure at the end which seems also not to belong in the manuscript.

      Thanks - the bonus figure is removed now.

      -  The methods should also describe briefly what kind of routines were used in the Matlab code, e.g. gradient descent with what optimizer?

      We’ve added that information as well.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) Peptides were synthesized with fluorescein isothiocyanate (FITC) and Tat tag, and then PEGylated with methoxy PEG Succinimidyl Succinate.

      I have two concerns about the peptide design. First, FTIC was intended "for monitoring" (line 129), but was never used in the manuscript. Second, PEGylation targets the two lysine sidechains on the Tat, which would alter its penetration property.

      We conducted an analysis of the cellular trafficking of FITC-tagged peptides following their permeabilization into cells.

      Author response image 1.

      However, we did not include it in the main text because it is a basic result.

      (2) As can be seen in the figure above, after pegylation and permeabilization, the cells were stained with FITC. It appears that this does not affect the ability to penetrate into the cells.

      (2) "Superdex 200 increase 10/300 GL column" (line 437) was used to isolate mono/di PEGylated PDZ and separate them from the residual PEG and PDZ peptide. "m-PEG-succinimidyl succinate with an average molecular weight of 5000 Da" (lines 133 and 134).

      To my knowledge, the Superdex 200 increase 10/300 GL column is not suitable and is unlikely to produce traces shown in Figure 1B.

      As Superdex 200 increase 10/300 GL featrues a fractionation range of 10,000 to 600,000 Da, we used it to fractionate PEGylated products including DiPEGylated PDZ (approx. 15 kDa) and MonoPEGylated PDZ (approx. 10 kDa) from residuals (PDZ and PEG), demonstrating successful isolation of PEGylated products (Figure 1C). Considering the molecular weights of PDZ and PEG are approximately 4.1 kDa and and 5.0 kDa, respectively, the late eluting peaks from SEC were likely to represent a mixed absorbance of PDZ and PEG at 215 nm.

      However, as the reviewer pointed out, it could be unreasonable to annotate peaks representing PDZ and PEG, respectively, from mixed absorbance detected in a region (11-12 min) beyond the fractionation range.

      In our revised manuscript, therefore, multiple peaks in the late eluting volume (11-12 min) were labeled as 'Residuals' all together. As a reference, the revised figure 1B includes a chromatogram of pure PDZ-WT under the same analytic condition.

      Therefore, we changed Fig.1B to new results as followed:

      (3) "the in vivo survival effect of LPS and PDZ co-administration was examined in mice. The pretreatment with WT PDZ peptide significantly increased survival and rescued compared to LPS only; these effects were not observed with the mut PDZ peptide (Figure 2a)." (lines 159-160).

      Fig 2a is the weight curve only. The data is missing in the manuscript.

      We added the survived curve into Fig. 2A as followed:

      (4) Table 1, peptide treatment on ALT and AST appears minor.

      In mice treated with LPS, levels of ALT and AGT in the blood are elevated, but these levels decrease upon treatment with WT PDZ. However, the use of mut PDZ does not result in significant changes. Figure 3A shows inflammatory cells within the central vein, yet no substantial hepatotoxicity is observed during the 5-day treatment with LPS. Normally, the ranges of ALT and AGT in C57BL6 mice are 16 ~ 200 U/L and 46 ~ 221 U/L, respectively, according to UCLA Diagnostic Labs. Therefore, the values in all experiments fall within these normal ranges. In summary, a 5-day treatment with LPS induces inflammation in the liver but is too short a duration to induce hepatotoxicity, resulting in lower values.

      (5) MitoTraker Green FM shouldn't produce red images in Figure 6.

      We changed new results (GREEN one) into Figs 6A and B as followed:

      (6) Figure 5. Comparison of mRNA expression in PDZ-treated BEAS-2B cells. Needs a clearer and more detailed description both in the main text and figure legend. The current version is very hard to read.

      We changed Fig. 5A to new one to understand much easier and added more detailed results and figure legend as followed:

      Results Section in Figure 5:

      “…we performed RNA sequencing analysis. The results of RNA-seq analysis showed the expression pattern of 24,424 genes according to each comparison combination, of which the results showed the similarity of 51 genes overlapping in 4 gene categories and the similarity between each comparison combination (Figure 5a). As a result, compared to the control group, it was confirmed that LPS alone, WT PDZ+LPS, and mut PDZ+LPS were all upregulated above the average value in each gene, and when LPS treatment alone was compared with WT PDZ+LPS, it was confirmed that they were averaged or downregulated. When comparing LPS treatment alone and mut PDZ+LPS, it was confirmed that about half of the genes were upregulated. Regarding the similarity between comparison combinations, the comparison combination with LPS…”

      Figure 5 Legend Section:

      “Figure 5. Comparison of mRNA expression in PDZ-treated BEAS-2B cells.

      BEAS-2B cells were treated with wild-type PDZ or mutant PDZ peptide for 24 h and then incubated with LPS for 2 h, after which RNA sequencing analysis was performed. (a) The heat map shows the general regulation pattern of about 51 inflammation-related genes that are differentially expressed when WT PDZ and mut PDZ are treated with LPS, an inflammatory substance. All samples are RED = upregulated and BLUE = downregulated relative to the gene average. Each row represents a gene, and the columns represent the values of the control group treated only with LPS and the WT PDZ and mut PDZ groups with LPS. This was used by converting each log value into a fold change value. All genes were adjusted to have the same mean and standard deviation, the unit of change is the standard deviation from the mean, and the color value range of each row is the same. (b) Significant genes were selected using Gene category chat (Fold change value of 2.00 and normalized data (log2) value of 4.00). The above pie chart shows the distribution of four gene categories when comparing LPS versus control, WT PDZ+LPS/LPS, and mut PDZ+LPS/LPS. The bar graph below shows RED=upregulated, GREEN=downregulated for each gene category, and shows the number of upregulated and downregulated genes in each gene category. (c) The protein-protein interaction network constructed by the STRING database differentially displays commonly occurring genes by comparing WT PDZ+LPS/LPS, mut PDZ+LPS/LPS, and LPS. These nodes represent proteins associated with inflammation, and these connecting lines denote interactions between two proteins. Different line thicknesses indicate types of evidence used in predicting the associations.”

      Reviewer 2:

      (1) In this paper, the authors demonstrated the anti-inflammatory effect of PDZ peptide by inhibition of NF-kB signaling. Are there any results on the PDZ peptide-binding proteins (directly or indirectly) that can regulate LPS-induced inflammatory signaling pathway? Elucidation of the PDZ peptide-its binding partner protein and regulatory mechanisms will strengthen the author's hypothesis about the anti-inflammatory effects of PDZ peptide

      As mentioned in the Discussion section, we believe it is crucial to identify proteins that directly interact with PDZ and regulate it. This direct interaction can modulate intracellular signaling pathways, so we plan to express GST-PDZ and induce binding with cellular lysates, then characterize it using the LC-Mass/Mass method. We intend to further research these findings and submit them for publication.

      (2) The authors presented interesting insights into the therapeutic role of the PDZ motif peptide of ZO-1. PDZ domains are protein-protein interaction modules found in a variety of species. It has been thought that many cellular and biological functions, especially those involving signal transduction complexes, are affected by PDZ-mediated interactions. What is the rationale for selecting the core sequence that regulates inflammation among the PDZ motifs of ZO-1 shown in Figure 1A?

      The rationale for selecting the core sequence that regulates inflammation among the PDZ motifs of ZO-1, as shown in Figure 1A, is grounded in the specific roles these motifs play in signal transduction pathways that are crucial for inflammatory processes. PDZ domains are recognized for their ability to function as scaffolding proteins that organize signal transduction complexes, crucial for modulating cellular and biological functions. The chosen core sequence is particularly important because it is conserved across ZO-1, ZO-2, and ZO-3, indicating a fundamental role in maintaining cellular integrity and signaling pathways. This conservation suggests that the sequence’s involvement in inflammatory regulation is not only significant in ZO-1 but also reflects a broader biological function across the ZO family.

      (3) In Figure 3, the authors showed the representative images of IHC, please add the quantification analysis of Iba1 expression and PAS-positive cells using Image J or other software. To help understand the figure, an indication is needed to distinguish specifically stained cells (for example, a dotted line or an arrow).

      We added the semi-quantitative results into Figs. 4d,e,f as followed:

      Result section: “The specific physiological mechanism by which WT PDZ peptide decreases LPS-induced systemic inflammation in mice and the signal molecules involved remain unclear. These were confirmed by a semi-quantitative analysis of Iba-1 immunoreactivity and PAS staining in liver, kidney, and lung,respectively (Figures 4d, e, and f). To examine whether WT PDZ peptide can alter LPS-induced tissue damage in the kidney, cell toxicity assay was performed (Figure 3g). LPS induced cell damage in the kidney, however, WT PDZ peptide could significantly alleviate the toxicity, but mut PDZ peptide could not. Because cytotoxicity caused by LPS is frequently due to ROS production in the kidney (Su et al., 2023; Qiongyue et al., 2022), ROS production in the mitochondria was investigated in renal mitochondria cells harvested from kidney tissue (Figure 3h)....”

      Figure legend section: “Indicated scale bars were 20 μm. (d,e,f) Semi-quantitative analysis of each are positive for Iba-1 in liver and kidney, and positive cells of PAS in lung, respectively. (g) After the kidneys were harvested, tissue lysates were used for MTT assay. (h) After...”

      (4) In Figure 6G, H, the authors confirmed the change in expression of the M2 markers by PDZ peptide using the mouse monocyte cell line Raw264.7. It would be good to add an experiment on changes in M1 and M2 markers caused by PDZ peptides in human monocyte cells (for example, THP-1).

      We thank you for your comments. To determine whether PDZ peptide regulates M1/M2 polarization in human monocytes, we examined changes in M1 and M2 gene expression in THP-1 cells. As a result, wild-type PDZ significantly suppressed the expression of M1 marker genes (hlL-1β, hIL-6, hIL-8, hTNF-ɑ), while increasing the expression of M2 marker genes (hlL-4, hIL-10, hMRC-1). However, mutant PDZ did not affect M1/M2 polarization. These results suggest that PDZ peptide can suppress inflammation by regulating M1/M2 polarization of human monocyte cells. These results are for the reviewer's reference only and will not be included in the main content.

      Author response image 2.

      Author response image 3.

      Minor point:

      The use of language is appropriate, with good writing skills. Nevertheless, a thorough proofread would eliminate small mistakes such as:

      - line 254, " mut PDZ+LPS/LPS (45.75%) " → " mut PDZ+LPS/LPS (47.75%) "

      - line 296, " Figure 6f " → " Figure 6h "

      We changed these points into the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript titled "Vangl2 suppresses NF-κB signaling and ameliorates sepsis by targeting p65 for NDP52-mediated autophagic degradation" by Lu et al, the authors show that Vangl2, a planner cell polarity component, plays a direct role in autophagic degradation of NFkB-p65 by facilitating its ubiquitination via PDLIM2 and subsequent recognition and autophagic targeting via the autophagy adaptor protein NDP52. Conceptually it is a wonderful study with excellent execution of experiments and controls. The concerns with the manuscript are mainly on two counts - First issue is the kinetics of p65 regulation reported here, which does not fit into the kinetics of the mechanism proposed here, i.e., Vangl2-mediated ubiquitination followed by autophagic degradation of p65. The second issue is more technical- an absolute lack of quantitative analyses. The authors rely mostly on visual qualitative interpretation to assess an increase or decrease in associations between partner molecules throughout the study. While the overall mechanism is interesting, the authors should address these concerns as highlighted below:

      Major points:

      (1) Kinetics of p65 regulation by Vangl2: As mentioned above, authors report that LPS stimulation leads to higher IKK and p65 activation in the absence of Vangl2. The mechanism of action authors subsequently work out is that- Vangl2 helps recruit E3 ligase PDLIM to p65, which causes K63 ubiquitination, which is recognised by NDP52 for autophagic targeting. Curiously, peak p65 activation is achieved within 30 minutes of LPS stimulation. The time scale of all other assays is way longer. It is not clear that in WT cells, p65 could be targeted to autophagic degradation in Vangl2 dependent manner within 30 minutes. The HA-Myc-Flag-based overexpression and Co-IP studies do confirm the interactions as proposed. However, they do not prove that this mechanism was responsible for the Vangl2-mediated modulation of p65 activation upon LPS stimulation. Moreover, the Vangl2 KO line also shows increased IKK activation. The authors do not show the cause behind increased IKK activation, which in itself can trigger increased p65 phosphorylation.

      We thank the reviewer for this valuable suggestion.

      Indeed, we agreed with the reviewer that peak p65 activation is achieved within 30 minutes of LPS stimulation in vitro, and p65 could not be targeted to autophagic degradation in a Vangl2 dependent manner within 30 minutes. Given that the protein and mRNA levels of Vangl2 were elevated at 3-6 h of LPS stimulation (Fig. S1 C-E), we extended the stimulation time scale in the revised manuscript. The data (Fig. 2A-D in the revised manuscript) demonstrated that IKK phosphorylation was enhanced in Vangl2 KO myeloid cells during the early phase (within 3 h) of LPS stimulation, but not for the prolonged period of LPS stimulation. The underlying mechanism may be complex. Only p65 phosphorylation was continuously enhanced after long-term LPS stimulation in Vangl2 KO cells, compared to WT cells. Furthermore, the overexpression of Vangl2 in A549 cells also demonstrated a reduction of phosphorylation and total endogenous p65 (Fig. 2 I, J in the revised manuscript). These findings were corroborated by overexpression and Co-IP experiments, which collectively indicated that Vangl2 regulates the stability of p65 by promoting its interaction with NDP52 and autophagic degradation. (Page 7; Line 183-185).  

      (2) The other major concern is regarding the lack of quantitative assessments. For Co-IP experiments, I can understand it is qualitative observation. However, when the authors infer that there is an increase or decrease in the association through co-IP immunoblots, it should also be quantified, especially since the differences are quite marginal and could be easily misinterpreted.

      We are grateful to the reviewer for this suggestion. The quantitative analysis has been updated in the revised version.

      (3) Figure 4E and F: It is evident that inhibiting Autolysosome (CQ or BafA1) or autophagy (3MA) led to the recovery of p65 levels and inducing autophagy by Rapamycin led to faster decay in p65 levels. Did the authors also note/explore the possibility that Vangl2 itself may be degraded via the autophagy pathway? IB of WCL upon CQ/BAF/3MA or upon Rapa treatment does indicate the same. If true, how would that impact the dynamics of p65 activation?

      We thank the reviewer for this question. Previous studies have shown that Vangl2 is primarily degraded by the proteasome pathway, rather than by the autolysosomal pathway (doi: 10.1126/sciadv.abg2099; doi: 10.1038/s41598-019-39642-z). In our experiments, Vangl2 recruits E3 ligase PDLIM2 to enhance K63-linked ubiquitination on p65, which serves as a recognition signal for cargo receptor NDP52-mediated selective autophagic degradation. Vangl2 facilitated the interaction between p65 and NDP52, yet itself did not undergo significant autophagic degradation.

      (4) Autophagic targeting of p65 should also be shown through alternate evidence, like microscopy etc., in the LPS-stimulated WT cells.

      We thank the reviewer for this suggestion. We have added the data (co-localization of p65 and LC3 was detected by immunofluorescence) in the revised version (Fig. S4 H in the revised manuscript). (Page 9, lines 267-268)

      Reviewer #2 (Public Review):

      Vangl2, a core planar cell polarity protein involved in Wnt/PCP signaling, mediates cell proliferation, differentiation, homeostasis, and cell migration. Vangl2 malfunctioning has been linked to various human ailments, including autoimmune and neoplastic disorders. Interestingly, Vangl2 was shown to interact with the autophagy regulator p62, and indeed, autophagic degradation limits the activity of inflammatory mediators such as p65/NF-κB. However, if Vangl2, per se, contributes to restraining aberrant p65/NF-kB activity remains unclear.

      In this manuscript, Lu et al. describe that Vangl2 expression is upregulated in human sepsis-associated PBMCs and that Vangl2 mitigates experimental sepsis in mice by negatively regulating p65/NF-κB signaling in myeloid cells. Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to promote K63-linked poly-ubiquitination of p65. Vangl2 also facilitates the recognition of ubiquitinated p65 by the cargo receptor NDP52. These molecular processes cause selective autophagic degradation of p65. Indeed, abrogation of PDLIM2 or NDP52 functions rescued p65 from autophagic degradation, leading to extended p65/NF-κB activity.

      As such, the manuscript presents a substantial body of interesting work and a novel mechanism of NF-κB control. If found true, the proposed mechanism may expand therapeutic opportunities for inflammatory diseases. However, the current draft has significant weaknesses that need to be addressed.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested.

      Specific comments

      (1) Vangl2 deficiency did not cause a discernible increase in the cellular level of total endogenous p65 (Fig 2A and Fig 2B) but accumulated also phosphorylated IKK.

      Even Fig 4D reveals that Vangl2 exerts a rather modest effect on the total p65 level and the figure does not provide any standard error for the quantified data. Therefore, these results do not fully support the proposed model (Figure 7) - this is a significant draw back. Instead, these data provoke an alternate hypothesis that Vangl2 could be specifically mediating autophagic removal of phosphorylated IKK and phosphorylated IKK, leading to exacerbated inflammatory NF-κB response in Vangl2-deficient cells. One may need to use phosphorylation-defective mutants of p65, at least in the over-expression experiments, to dissect between these possibilities.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested.

      (1) Indeed, we agreed with the reviewer that Vangl2 deficiency did not cause a discernible increase in the cellular level of total p65 after a short time of LPS stimulation in vitro, and p65 could not be targeted to autophagic degradation in a Vangl2 dependent manner within 30 minutes. Given that the protein and mRNA levels of Vangl2 were elevated at 3-6 h of LPS stimulation (Fig. S1 C-E), we extended the stimulation time scale in the revised manuscript. The data (Fig. 2A-D in the revised manuscript) demonstrated that IKK phosphorylation was enhanced in Vangl2 KO myeloid cells during the early phase (within 3 h) of LPS stimulation, but not for the prolonged period of LPS stimulation. The underlying mechanism may be complex. Only phosphorylation of p65 and total endogenous p65 was continuously enhanced after long-term LPS stimulation in Vangl2 KO cells, compared to WT cells. Furthermore, the overexpression of Vangl2 in A549 cells also demonstrated a reduction of phosphorylation and total endogenous p65 (Fig. 2 I, J in the revised manuscript). These findings were corroborated by overexpression and Co-IP experiments, which collectively indicated that Vangl2 regulates the stability of p65 by promoting its interaction with NDP52 and autophagic degradation. (Page 7; Line 183-185).  

      (2) Similarly, the stimulation time scale in Fig 4D was extended, and it was demonstrated that p65 was more stable in Vangl2-deficient cells.

      3) Moreover, we constructed phosphorylation-defective mutants of p65 (S536A), and found that Vangl2 could also promote the degradation of the p65 phosphorylation mutants (Fig. S4 A, B in the revised manuscript). Thus, Vangl2 promote the degradation of the basal/unphosphorylated p65. (Page 8, lines 237-240)

      (2) Fig 1A: The data indicates the presence of two subgroups within the sepsis cohort - one with high Vangl2 expressions and the other with relatively normal Vangl2 expression. Was there any difference with respect to NF-κB target inflammatory gene expressions between these subgroups?

      As suggested, we conducted an analysis of NF-kB target inflammatory gene expressions between the high and relatively low Vangl2 expression groups in sepsis patients. The results showed that the serum of the high Vangl2 expression group exhibited lower levels of IL-6, WBC, and CRP than the low Vangl2 expression group, which suggested an inverse correlation between Vangl2 and the inflammatory response (Fig. S1 A in the revised manuscript) (Page 5, lines 126-128).

      (3) The effect of Vangl2 deficiency was rather modest in the neutrophil. Could it be that Vangl2 mediates its effect mostly in macrophages?

      As showed in Fig. S1C-E, the induction of Vangl2 by LPS stimulation is more rapid in macrophages than in neutrophils. This may contribute to its dominant effect in macrophages. Consequently, we primarily focused our investigation on the role of Vangl2 in macrophages.

      (4) Fig 1D and Figure 1E: Data for unstimulated Vangl2 cells should be provided. Also, the source of the IL-1β primary antibody has not been mentioned.

      Thank you for the suggestion. We have updated the data for unstimulated cells in the revised manuscript (Fig. 1 D, E in the revised manuscript). Also, IL-1β primary antibody was purchased from Cell Signaling Technology and the information has been included in the Materials and Methods section (Table S1).

      (5) The relevance and the requirement of RNA-seq analysis are not clear in the present draft. Figure 1E already reveals upregulation of the signature NF-κB target inflammatory genes upon Vangl2 deficiency.

      We agreed with the reviewer that the data presented in Figure 1E demonstrated the upregulation of the signature NF-kB target inflammatory genes upon Vangl2 deficiency in a murine model of LPS induced sepsis. Subsequently, we proceeded to investigate the mechanism by which Vangl2 regulates NF-kB target inflammatory genes at the cellular level in Figure 2. To this end, we performed RNA-seq analysis to screen signal pathways involved in LPS-induced septic shock by comparing LPS-stimulated BMDMs from Vangl2ΔM and WT mice, and identified that TNF signaling pathway and cytokine-cytokine receptor interaction were found to be significantly enriched in Vangl2ΔM BMDMs upon LPS stimulation. This analysis provides further evidence that Vangl2 plays a role in regulating NF-kB signaling pathways and the release of related inflammatory cytokines.

      (6) Fig 2A reveals an increased accumulation of phosphorylated p65 and IKK in Vangl2-deficient macrophages upon LPS stimulation within 30 minutes. However, Vangl2 accumulates at around 60 minutes post-stimulation in WT cells. Similar results were obtained for neutrophils (Fig 2B). There appears to be a temporal disconnect between Vangl2 and phosphorylated p65 accumulation - this must be clarified.

      This concern has been addressed above (see response to questions 1 from reviewer #2). 

      (7) Figure 2E and 2F do not have untreated controls. Presentations in Fig 2E may be improved to more clearly depict IL6 and TNF data, preferably with separate Y-axes.

      Thank you for the suggestion. We have added untreated controls and separated Y-axes for IL-6 and TNF data in the revised manuscript (Fig. 2 E, F in the revised manuscript).

      (8) Line 219: "strongly with IKKα, p65 and MyD88, and weak" - should be revised.

      We have improved the manuscript as suggested in the revised manuscript (Page 7; Line 203).

      (9) It is not clear why IKKβ was excluded from interaction studies in Fig S3G.

      We added the Co-IP experiment and showed that HA-tagged Vangl2 only interacted with Flag-tagged p65, but not with Flag-tagged IKKb in 293T cells (Fig S3H). Furthermore, endogenous co-IP immunoblot analyses showed that Vangl2 did not associate with IKKb (Fig. S3I)

      (10) Fig 3F- In the text, authors mentioned that Vangl2 strongly associates with p65 upon LPS stimulation in BMDM. However, no controls, including input or another p65-interacting protein, were used.

      As reviewer suggested, we have added input and positive control (IkBa) in this experiment (Fig. 3F in the revised manuscript). The results demonstrated that the interaction between p65 and IkBa was attenuated, although the total IkBa did not undergo significant degradation over long-term course of LPS stimulation.

      (11) Figure 4D - Authors claim that Vangl2-deficient BMDMs stabilized the expression of endogenous p65 after LPS treatment. However, p65 levels were particularly constitutively elevated in knockout cells, and LPS signaling did not cause any further upregulation. This again indicates the role of Vangl2 in the basal state. The authors need to explain this and revise the test accordingly.

      Thank you for the reviewer's comments. We repeated the experiment to ascertain whether Vangl2 could stabilize the expression of endogenous p65 before and after LPS treatment. It was found that, due to the extremely low expression of Vangl2 in WT cells in the absence of stimulation, there was no observable difference on the basal level of p65 between WT and Vangl2DM cells. However, upon prolonged LPS stimulation, Vangl2 expression was induced, resulting in p65 degradation in WT cells. In contrast, p65 protein was more stable in Vangl2 deficient cells after LPS stimulation (Fig. 4D in the revised manuscript).

      Reviewer #3 (Public Review):

      Lu et al. describe Vangl2 as a negative regulator of inflammation in myeloid cells. The primary mechanism appears to be through binding p65 and promoting its degradation, albeit in an unusual autolysosome/autophagy dependent manner. Overall, the findings are novel and the crosstalk of PCP pathway protein Vangl2 with NF-kappaB is of interest. …….Regardless, Vangl2 as a negative regulator of NF-kappaB is an important finding. There are, however, some concerns about methodology and statistics that need to be addressed.

      Thank you for your comments on our manuscript, and we have further improved the manuscript as suggested.

      (1) Whether PCP is anyway relevant or if this is a PCP-independent function of Vangl2 is not directly explored (the later appears more likely from the manuscript/discussion). PCP pathways intersect often with developmentally important pathways such as WNT, HH/GLI, Fat-Dachsous and even mechanical tension. It might be of importance to investigate whether Vangl2-dependent NF-kappaB is influenced by developmental pathways.

      Thank you for the reviewer's insightful comments. Our study revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to facilitate K63-linked ubiquitination of p65, which is subsequently recognized by autophagy receptor NDP52 and then promotes the autophagic degradation of p65. Our findings by using autophagy inhibitors and autophagic-deficient cells indicate that Vangl2 regulates NF-kB signaling through a selective autophagic pathway, rather than affecting the PCP pathway, WNT, HH/GLI, Fat-Dachsous or even mechanical tension. Moreover, a discussion section has been added to the revised version. (Page 12, lines 377-393)

      (2) Are Vangl2 phosphorylations (S5, S82 and S84) in anyway necessary for the observed effects on NF-kappaB or would a phospho-mutant (alanine substitution mutant) Vangl2 phenocopy WT Vangl2 for regulation of NF-kappaB?

      As suggested, we generated phospho-mutants of Vangl2 (S82/84A) and observed that Vangl2 (S82/84A) could still facilitate the degradation of p65 (Fig. S4 B in the revised manuscript), suggesting that Vangl2 regulates the NF-kB pathway independently of its phosphorylation.

      (3) Another area to strengthen might be with regards to specificity of cell types where this phenomenon may be observed. LPS treatment in mice resulted in Vangl2 upregulation in spleen and lymph nodes, but not in lung and liver. What explains the specificity of organ/cell-type Vangl2 upregulation and its consequences observed here? Why is NF-kappaB signaling not more broadly or even ubiquitously affected in all cell types in a Vangl2-dependent manner, rather than being restricted to macrophages, neutrophils and peritoneal macrophages, or, for that matter, in spleen and LN and not liver and lung? After all, one may think that the PCP proteins, as well as NF-kappaB, are ubiquitous.

      Thank you for the reviewer's comments.

      (1) LPS is an important mediator to trigger sepsis with excessive immune activation. As is well known, the spleen and lymph nodes are important peripheral immune organs, where immune cells (e.g., macrophages) are abundant and respond sensitively to LPS stimulation. Nevertheless, immune cells represent a minor fraction of the lungs and liver. Consequently, Vangl2 represents a pivotal regulator of immune function, exhibiting a more pronounced increase in the immune organs and cells.

      2) Induction of Vangl2 expression by LPS stimulation is cell specific. Given that different cells exhibit varying protein abundances, the molecular events involved may also differ. Moreover, we observed high Vangl2 expression in the liver at the basal state (Author response image 1), whereas it was not induced after 12 h of LPS stimulation. Therefore, the functional role of Vangl2 exhibits significant phenotype in macrophages and neutrophils/spleen and LN, rather than in liver or lung cells.

      Author response image 1.

      Vangl2 showed no significant changes in the liver after LPS treatment.

      Mice (n≥3) were treated with LPS (30 mg/kg, i.p.). Livers were collected at 12 h after LPS treatment. Immunoblot analysis of Vangl2.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      General points:

      Figure 4G- panels appear mislabeled. Pl correct.

      We have corrected this mislabeling as you suggested.

      The dynamics of Vangl2 interaction with p65 and autophagy adaptors is not clear/apparent. For example, Vangl2 expression destabilises p65 levels (as in Fig. 4), but in Fig. 5, it seems there is no decline in the p65 protein level, and a large fraction of it coprecipitates with NDP52.

      We appreciate the reviewer’s comments. In the co-IP assay, we used the lysosomal inhibitor CQ to inhibit p65 degradation to observe the interaction between p65 and NDP52 or Vangl2.

      Fig 5E- I would expect p65 levels to be lower in WT cells than Vangl2 KO cells. But as such, there is no difference between the two.

      We appreciate the reviewer’s comments. We repeated the experiments and updated the data. Firstly, Vangl2 was not induced in WT cells in the absence of LPS stimulation, thus there was no difference in p65 expression between the two groups at the basal level. Secondly, we used CQ/Baf-A1 to inhibit the degradation of Vangl2 in the co-IP assay to observe the interaction between p65 and other molecule.

      Reviewer #2 (Recommendations For The Authors):

      A few points that can be looked at and revised.

      (1) Quantification of the presented data is needed for Fig 4D and Fig 4E.

      We added the quantification analysis as suggested.  

      (2) The labeling of Fig 4G should be scrutinized.

      We have corrected this mislabeling as you suggested.

      (3) Fig 6B and Fig 6C should be explained in the result section more elaborately.

      We thank the reviewer for the suggestion, and we have rephrased this sentence to better describe the results. (Page 10, lines 306-313)

      (4) Line 85: "Vangl2 mediated downstream of Toll-like or interleukin (IL)-1" - unclear.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript. (Page 3, lines 68)

      (5) Line 181: "mice. Differentially expression analysis" - this should be revised.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript. (Page 11, lines 323)

      (6) Line 261-264- CHX-chase assay showed the degradation rate of p65 in Vangl2-deficient BMDM was slower compared with WT cells. However, Vangl2 is not induced in WT BMDMs upon CHX treatment (Fig. S4B).

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript (Fig. S4D).

      (7) Finally, some editing to provide data only critical for the conclusions could improve the ease of reading.

      We have further improved the manuscript as suggested in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      Comments (general, please address at least in Discussion. Some experimental data, for example the role, if any, of Vangl2 phosphorylations will be very useful):

      (1) It might be interesting to explore whether there are any potential effects of developmental pathways on the observed effect mediated by Vangl2 or if the effects are entirely a PCP-independent function of Vangl2. Please see above public review.

      Thank you for the reviewer's insightful comments. Our study revealed that Vangl2 recruits the E3 ubiquitin ligase PDLIM2 to facilitate K63-linked ubiquitination of p65, which is subsequently recognized by autophagy receptor NDP52 and then promotes the autophagic degradation of p65. Our findings by using autophagy inhibitors and autophagic-deficient cells indicate that Vangl2 regulates NF-kB signaling through a selective autophagic pathway, rather than affecting the PCP pathway, WNT, HH/GLI, Fat-Dachsous or even mechanical tension. Furthermore, we generated phospho-mutants of Vangl2 (S82/84A) and observed that Vangl2 (S82/84A) could still facilitate the degradation of p65 (Fig. S4 B), suggesting that Vangl2 regulates the NF-kB pathway independently of its phosphorylation. In addition, a discussion section has been added to the revised version. (Page 12, lines 377-393)

      (2) What explains the specificity of organ/cell-type Vangl2 upregulation and its consequences observed here? Why is NF-kappaB signaling not more broadly or even ubiquitously affected in all cell types in a Vangl2-dependent manner, rather than being restricted to macrophages, neutrophils and peritoneal macrophages, or, for that matter, in spleen and LN and not liver and lung? Afterall, one may think that the PCP proteins, as well as NF-kappaB, are ubiquitous.

      Thank you for the reviewer's comments. A similar question has been addressed above (refer to the response to question 3 of reviewer 3).

      (3) Another specificity-related question that comes to mind is whether the Vangl2 function in autolysomal/autophagic degradation is restricted to p65 as the exclusive substrate? The cytosolic targeting of p65 as opposed to the more well-known nuclear-targeting is interesting.

      Our previous finding demonstrated that Vangl2 inhibits antiviral IFN-I signaling by targeting TBK1 for autophagic degradation (doi: 10.1126/sciadv.adg2339), thereby indicating that p65 is not the sole substrate for Vangl2. However, in the NF-kB pathway, p65 is a specific substrate for Vangl2. Moreover, our findings indicate that the interaction between Vangl2 and p65 occurs predominantly in the cytoplasm, rather than in the nucleus (Fig. S4 C).

      (4) Pharmacological approach is used to tease apart autolysosome versus proteasome pathway. What is the physiological importance of autophagic degradation? It is interesting to note that Vangl2 was already previously implicated in degrading LAMP-2A and increasing chaperon-mediated autophagy (CMA)-lysosome numbers (PMID: 34214490).

      Previous literature has domonstrated that Vangl2 can inhibit CMA degradation (PMID: 34214490). However, in our study, we found that Vangl2 can promote the selective autophagic degradation of p65. It is important to note that CMA degradation and selective autophagic degradation are two distinct degradation modes, which is not contradictory.

      (5) Are these phenotypes discernable in heterozygotes or only when ablated in homozygosity? Any phenotypes recapitulated in the looptail heterozygote mice?

      We found that these phenotypes discernable only in homozygosity.

      (6) What is the conservation of the Vangl2 p65-interaction site between Vangl2 and Vangl1? PDLIM2 recruitment between Vangl2 and Vangl1?

      We appreciate the reviewer’s comments on our manuscript. Previous studies have shown that human Vangl1 and Vangl2 exhibit only 72% identity and exhibit distinct functional properties (doi: 10.1530/ERC-14-0141).Thus, the interaction of Vangl2 with p65 and PDLIM2 recruitment may not necessarily occur in Vangl1.

      Comments (specific to experiments and data analyses. Please address the following):

      (7) The patient population used in Fig 1 is not described in the Methods. This is a critical omission. Were age, sex etc. controlled for between healthy and disease? How was the diagnosis made? What times during sepsis were the samples collected? As presented, this data is impossible to evaluate and interpret.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised supplement materials. (Supplementary information, Page 12, lines 146-147)

      (8) In general, the statistical method should be described for each experiment presented in the figures. Comparisons should not be made only at the time point with maximal difference (such as in Fig 1F or Fig 2C, but at all time points using appropriate statistical methods). The sample size should also be included to allow determination appropriateness of parametric or non-parametric tests.

      We appreciate the reviewer’s comments on our manuscript, and we have further improved the manuscript as suggested in the revised manuscript (Figures 1F and 2C).

      (9) PCP pathways can activate p62/SQSTM1 or JNK via RhoA. JNK activation should be tested experimentally.

      According to the reviewer's comments, we further examined the effect of Vangl2 on the JNK pathway. The results showed that Vangl2 did not affect the JNK pathway (Author response image 2). This suggests that Vangl2 functions independently of the PCP pathway.

      Author response image 1.

      Vangl2 did not affect the JNK pathway. WT and Vangl2-deficient (n≥3) BMDMs were stimulated with LPS (100 ng/ml) for the indicated times. Immunoblot analysis of total and phosphorylated JNK.

      (10) Why are different cells such as A549, HEK293, CHO, 293T, THP-1 used during the studies for different experiments? Consistency would improve rigor. At least, logical explanation driving the cell type of choice for each experiment should be included in the manuscript. Nonetheless, one aspect of using a panel of cell lines indicate that the effect of Vangl2 on NF-kappa B is pleiotropic.

      We are grateful to the reviewer for their comments on our manuscript. A549, HEK293, CHO, and 293T cells are commonly utilized in protein-protein interaction studies. The selection of cell lines for overexpression (exogenous) experiment is dependent on their transfection efficiency and the ability to express TLR4 (the receptor for LPS). Additionally, we conducted endogenous experiments by using THP-1 and BMDMs, which are human macrophage cell lines and murine primary macrophages, respectively. Moreover, we generated Vangl2f/f lyz-cre mice by specifically knocking out Vangl2 in myeloid cells, and investigated the effect of Vangl2 on NF-kB signaling in vivo.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):  

      In this study, Hunt et al investigated the role of the ubiquitin-conjugating enzyme UBE2D/effete (eff) in maintaining proteostasis during aging. Utilizing Drosophila as a model, the researchers observed diverse roles of E2 ubiquitinconjugating enzymes in handling the aggregation-prone protein huntingtin-polyQ in the retina. While some E2s facilitated aggregate assembly, UBE2D/eff and other E2s were crucial for degradation of hL-polyQ. The study also highlights the significance of UBE2D/eff in skeletal muscle, showing that declining levels of eff during aging correlate with proteostasis disruptions. Knockdown of eff in muscle led to accelerated accumulation of poly-ubiquitinated proteins, shortened lifespan, and mirrored proteomic changes observed in aged muscles. The introduction of human UBE2D2, analogous to eff, partially rescued the deficits in lifespan and proteostasis caused by eff-RNAi expression in muscles. 

      The conclusions of this paper are mostly well supported by data, although a more precise mechanistic explanation of phenotypes associated with UBE2D/eff deficiency would have strengthened the study. Additionally, some aspects of image quantification and data analysis need to be clarified and/or extended.  

      We thank reviewer #1 for the thoughtful assessment of our work. We have amended the discussion to better explain the phenotypes associated with UBE2D/eff deficiency. We have also improved the methods describing the procedures for image quantification and data analysis.

      Reviewer #2 (Public Review):  

      Important findings: 

      - Knockdown of UBE2D increases HTT aggregation. 

      - Knockdown of UBE2D leads to an accumulation of ubiquitinated proteins and reduces the lifespan of Drosophila, which is rescued by an ectopic expression of the human homolog. 

      - UBE2D protein levels decline with aging. 

      - UBE2D knockdown is associated with an up- and downregulation of several different cellular pathways, including proteostasis components. 

      Thank you for reviewing our manuscript.

      Caveats: 

      - The readout of HTT aggregation (with methods that are not suitable) as a proxy for the role of UBE2D in proteostasis is not convincing. It would probably improve the manuscript to start with the proteomic analysis of UBE2D to demonstrate that its protein levels decrease with aging. The authors could then induce UBE2D in aged animals to assess the role of UBE2D in the proteome with aging.  

      While presenting the data in a different order would be possible, we prefer to keep the current order in which from a general screen with a proteostasis readout (HTT aggregates; see the answer below for a discussion on the methods) we proceed to identify a candidate (UBE2D) which is then studied in more detail with additional focused analyses in the retina and skeletal muscle during aging. Concerning the induction of UBE2D in aged animals, our analyses in Figure 4E demonstrate that muscle-specific induction of UBE2D2 throughout life does not increase lifespan alone: this could be explained by UBE2D2 only partially recapitulating the function and substrate diversity of Drosophila eff/UBE2D due to divergence from a single Drosophila UBE2D enzyme (eff) to multiple UBE2D enzymes in humans (UBE2D1/2/3/4).

      - UBE2D knockdown increases the number of HTT foci (Figure 1A), but the quantification is less convincing as depicted in Figure 1B, and other E2 enzymes show a stronger effect (e.g. Ubc6 that is only studied in Figures 1 and 2 without an explanation and Ubc84D). The graph is hard to interpret. What is the sample size and which genetic conditions show a significant change? P values and statistical analyses are missing.  

      The full data underlying this genetic screen is reported in Supplementary Table 1. The role of UBC6/UBE2A/B is thoroughly examined in Hunt et al 2021 (PMID: 33658508). We agree that Ubc84D has an important effect and that it should be considered for future studies. We have amended the legend of Figure 1 to indicate that each data point in the graph represents a single RNAi line targeting the corresponding gene. The mean of 5 biological replicates is shown for each RNAi, with each biological replicate representing a single eye imaged from a distinct fly. Therefore, the data points that do not show large magnitude changes may indicate RNAi lines that were not effective at knocking down the target protein (or that did not affect HTT aggregates). The E2s worth pursuing were identified because of multiple RNAi lines scoring consistently: this is the case of UBC6 (studied previously in PMID: 33658508) and eff/UBE2D (pursued in this study). This screen was therefore utilized to identify and select candidate genes (i.e. eff/UBE2D) for more in-depth studies on proteostasis.

      - The quantification of the HTT fluorescence cannot be used as a proxy for HTT aggregation. The authors should assess HTT aggregation by e.g. SDD-AGE, FRAP, filter retardation, etc. The quantification of the higher MW species of HTT in the SDS-PAGE is not ideal either as this simply reflects material that is stuck in the wells that could not enter the gel. Aggregation and hence high MW size could be one reason, but it can also be HTT trapped in cell debris, etc.  

      We agree that the use of multiple methods is a good way to assess the impact of E2 enzymes on HTT protein aggregation. In this regard, we estimated HTT aggregates by fluorescence microscopy and by western blot. Microscopy-based analyses demonstrate both the accumulation of the HTT-GFP pathogenic protein into aggregates (HTT polyQ polypeptides aggregating into one spatial region; Fig. 1 and Fig. 2B) as well as their potential cytotoxicity, resulting in the disruption of the ommatidial ultrastructure and cellular degeneration (Fig. 2A). Similar to native gels and filter retardation, we have utilized SDS-PAGE and western blotting of cellular samples isolated with strong chaotropic and denaturing reagents (8M urea plus detergents and reducing reagents used in the lysis). These experimental conditions maintain the higher-order organization of HTT into high-molecular-weight aggregates that are not broken down into individual polypeptides and that therefore do not readily travel through a gel or filter. Therefore, the biochemical methods we have used are equivalent to those proposed by the reviewer. In addition to combining microscopy-based and biochemical approaches to examine the impact of eff/UBE2D on the HTT aggregates, we have analyzed eff/UBE2D during skeletal muscle aging and found consistent phenotypes as those observed in the HTT model: RNAi for eff/UBE2D leads to the accumulation of detergent-insoluble ubiquitinated proteins that associate with protein aggregates.

      - Does UBE2D ubiquitinate HTT? And thus, is HTT accumulation a suitable readout for the functional assessment of the E2 enzyme UBE2D? 

      We propose that the accumulation of HTT in response to eff/UBE2D RNAi may be due to a generalized loss of protein quality control rather than to a direct decline in the ubiquitination of HTT by eff/UBE2D. In a previous study that examined the UBE2D interactome (Hunt et al. 2023; PMID: 37963875), we did not find an interaction between UBE2D and HTT, suggesting that HTT may not be directly modulated by eff/UBE2D via ubiquitination.

      - The proteomic analyses could help to identify potential substrates for UBE2D.

      The proteomic analyses in Figure 5 identify several proteins that are modulated by RNAi for eff and by its human homolog, UBE2D2. Such eff/UBE2D2-modulated proteins may indeed be potential substrates for UBE2D-mediated ubiquitination. For example, this is the case for Pex11 and Pex13, which were found to be upregulated upon UBE2D RNAi also in human cells, where they are ubiquitinated in a UBE2D-dependent manner (Hunt et al. 2023; PMID: 37963875).

      - Are there mutants available for UBE2D or conditional mutants? One caveat of RNAi is: first not complete knockdown and second, variable knockdown efficiencies that increase variability.

      There are potential hypomorphic alleles of eff/UBE2D that may be available, but they would present the same caveats of incomplete loss of eff/UBE2D function as RNAi. Given the strong phenotype that we find with partial eff knockdown, a caveat of full eff/UBE2D knockout is that this could be lethal.

      - The analysis of the E3 enzymes does not add anything to this manuscript. 

      The analysis of E3 enzymes relates to our recent publication (Hunt et al. 2023; PMID: 37963875) that reports the physical interactions between E2 and E3 enzymes. Analysis of these E2-E3 pairs in the genetic screen in Fig.1 therefore follows this IP-MS study to provide insight into the functional interaction between these E2-E3 pairs in proteostasis.

      - Figure 2B: the fluorescence intensities in images 2 and 4 are rather similar, yet the quantification shows significant differences. 

      Please note that some of the GFP fluorescence in image 4 is not punctate, but rather diffuse fluorescence that is not related to HTT-GFP aggregates. Our image quantitation methods utilized thresholding to identify GFP-positive puncta while eliminating background fluorescence not corresponding to HTT-GFP puncta.

      - The proteomic analyses could provide insights into the functional spectrum of UBE2D or even the identification of substrates. Yet apart from a DAVID analysis, none of the hits were followed up. In addition, only a few hits were labelled in the volcano plots (Figure 5). On what basis did the authors select those?

      Please see the previous answer above regarding the identification of eff/UBE2D protein substrates from our proteomic analysis in Fig. 5. Only some of the top-regulated hits could be labeled in Fig.5 to avoid overcrowding.

      - The manuscript remains at this stage rather descriptive. 

      Our study has demonstrated a key role for the eff/UBE2D ubiquitin-conjugating enzyme in regulating protein quality control during aging in the Drosophila retina and skeletal muscle. Our study has identified key proteins that are modulated by eff/UBE2D RNAi in Drosophila muscle, that are rescued by expression of human UBE2D2, and that may underlie the accelerated decline in proteostasis that occurs upon eff/UBE2D RNAi. While more could be known about the regulation of these eff/UBE2D-modulated proteins in Drosophila, we have previously demonstrated that some of the proteins that are upregulated by UBE2DRNAi in human cells (e.g. some peroxins) are indeed direct ubiquitination targets of UBE2D via associated E3 ubiquitin ligases (Hunt et al. 2023; PMID: 37963875).

      Reviewer #3 (Public Review):  

      This is a potentially quite interesting paper that defines E2 and E3 genes in Drosophila that can impact the accumulation of the Q72-GFP protein in the fly eye. The authors then focus on the eff gene, showing which human homolog can rescue fly knockdown. They extend to skeletal muscle, from the hL protein, to show that eff by TMT mass spec decreases with age normally in the fly muscle and that there is a significant overlap of proteins that are disrupted with eff knockdown in young animals in muscle vs aged animals normally in muscle. 

      Overall these data suggest eff decrease with age may contribute to the increase in ubiquitinated proteins in muscle with age, and that upregulation of eff activity might be of interest to extending lifespan. Because eff function can be performed by a human homologue, the findings may also apply to human situations of aging. 

      These data are overall interesting and are of relevance for those interested in neurodegenerative disease and aging, although a number of points from the figures seem confusing and need more explanation or clarity. 

      Thank you for reviewing our manuscript, we have improved the explanations and clarity of the manuscript.

      Recommendations for the authors:

      We would like to keep the manuscript title as it is currently to report the partial overlap in the proteomic changes induced by aging and effRNAi (Fig. 6).

      Reviewer #1 (Recommendations For The Authors): 

      (1) A significant concern arises from the unexpected outcome observed in the UBE2D/eff loss-of-function experiments. Despite its role as a ubiquitin-conjugating enzyme (E2), the reduction in UBE2D/eff levels paradoxically increased polyubiquitinated proteins and p62 accumulation, presenting a more intricate and seemingly unrelated phenotype to its anticipated function. 

      eff/UBE2D represents one out of 21 different Drosophila E2 ubiquitin-conjugating enzymes and therefore eff RNAi alone is unlikely to reduce the total pool of ubiquitinated proteins. The generalized increase in insoluble polyubiquitinated proteins results from an overall derangement of protein quality control caused by effRNAi. In agreement with this scenario, the protein categories that were found to be modulated by effRNAi (Fig. 5) include proteins associated with protein quality control such as proteasome components and chaperones. Therefore, derangement in the levels of a wide range of regulators of proteostasis may lead to a generalized loss of protein quality control upon effRNAi.

      I believe elucidating the mechanisms underlying the impact of UBE2D/eff deficiency on the observed phenotypes would contribute to a more comprehensive understanding of the study's implications. For instance, investigating whether the loss of UBE2D/eff influences muscle proteostasis by impeding proteasome assembly or function, modulating autophagy, etc. 

      We have previously utilized luciferase assays to measure the proteolytic activity of the proteasome in human cells treated with siRNAs targeting UBE2D1/2/3/4 but found no effect of UBE2D knockdown compared to control nontargeting siRNAs (Hunt et al. 2023; PMID: 37963875). In Drosophila muscles, we have examined the levels of GFP-CL1 (a GFP fused with a proteasomal degron) and found that effRNAi does not impact GFP-CL1 levels (data shown in author response image 1). Overall, these results suggest that effRNAi reduces protein quality control without affecting proteasome activity.

      Author response image 1.

      (2) Related to Figures 1B-C: It is not clear to this reviewer the quantification methodology used in the experiment. Does each point represent the Average +/- SD for each replicate? If so, it appears that not all cases align with the n=5 as indicated in the figure legend. Additionally, how many animals per replicate were quantified? 

      We have amended the legend of Figure 1 to indicate that each data point in the graph represents a single RNAi line targeting the corresponding gene. The mean of 5 biological replicates is shown for each RNAi line, with each biological replicate representing a single eye imaged from a distinct fly. Therefore, the data points that do not show large magnitude changes may indicate RNAi that were not effective at knocking down the target protein (or with no effect on HTT aggregates).  

      (3) Related to the previous point: The analysis of pathogenic Huntingtin aggregation in the Materials and Methods section lacks information regarding the number of individuals, replicates, etc. 

      Please see the response above.

      (4) Related to Figure 1 B: In the case of eff/UBE2D, it appears that 3 out of 9 replicates demonstrate a significant increase in HL-polyQ aggregates. Considering the strength of this result, it raises questions about whether it justifies using eff for future analyses. 

      Please see the response to point (2) above. These results indicate that 3 distinct UAS-RNAi lines targeting eff/UBE2D produced the same effect whereas 6 other effRNAi lines did not, possibly because they are less efficacious in knocking down eff/UBE2D. We have now amended the legend of Fig. 1B to better explain these results.

      (5) Related to Figure 1 D-E: Could the authors provide clarification regarding the tissue type and animal age utilized in these experiments? 

      Whole flies were utilized at 1 week of age.

      (6) Related to Figure 3: Incorporating the normal accumulation of poly-ubiquitinated proteins during aging could provide context to better interpret the effect of eff/UBE2D KD at 3 weeks of age. 

      Several papers from us and others have previously demonstrated a progressive increase in the insoluble levels of poly-ubiquitinated proteins during aging in Drosophila skeletal muscle (PMID: 36640359; PMID: 31249065; PMID: 33773104; PMID: 33658508; PMID: 24092876; PMID: 21111239; PMID: 24244197; PMID: 25199830; PMID: 28878259; PMID: 36213625). Our analyses now indicate that such age-related loss of protein quality control is accelerated by eff/UBE2D knockdown.

      (7) Related to Figure 3: Would it be possible for the authors to include a list or table detailing the specific E2, deubiquitinating enzymes, and E3s identified in the comparative analysis of the old vs young proteome? This would provide a clear reference for the identified regulatory proteins involved in the age-related proteomic changes. 

      We have added a tab to Supplementary Table 2 to report the list of age-regulated deubiquitinating enzymes (DUBs) and E1, E2, and E3 enzymes.

      (8) Related to Figures 3 and 4: Given that the comparative analysis of the old versus young proteome identified 10 out of 21 E2 ubiquitin-conjugating enzymes, exploring the impact of eff/UBE2D overexpression becomes pivotal to understanding its role in age-related changes in proteostasis and lifespan. Conducting an experiment involving eff overexpression could provide valuable insights into whether restoring eff levels mitigates aging-related phenotypes. 

      Although we have not done this experiment with eff overexpression, Fig. 4E reports that the overexpression of human UBE2D2 in skeletal muscle does not appear to influence lifespan by itself (green line in Fig. 4E), although it can partially rescue the short lifespan of flies with muscle-specific effRNAi (purple line in Fig. 4E).

      (9) Providing a more detailed description of the Supplementary Tables would significantly enhance the reader's comprehension of their content. 

      A description has been added at the end of the methods.

      Reviewer #2 (Recommendations For The Authors): 

      In addition, to the points listed above: 

      - The title does not reflect the content of the manuscript and should be changed. There is no evidence that UBE2D maintains a "youthful" (needs to be changed as well) proteome. Rather, its expression declines with aging and its depletion leads to an increase of ubiquitinated proteins. This is true for essentially the entire proteostasis network. 

      While proteostasis generally declines with aging, it is incompletely understood what specific components of the proteostasis network are dysregulated with aging. Our study now identifies the E2 ubiquitin-conjugating enzyme eff/UBE2D as a key regulator of proteostasis that is transcriptionally downregulated with aging. Comparison of the proteomic changes induced by aging versus those induced by effRNAi in young age indicates a partial overlap (Fig. 6), indicating that eff/UBE2D is, at least in part, necessary to maintain the proteome composition that is found in young age (“youthful”). On this basis, we would like to keep the current title but have amended the manuscript to indicate that such regulation of the proteome composition is only in part dependent on eff/UBE2D.

      - Molecular weight markers are missing for the gels/western blot depicted in Fig 1E, 2C, 3E, and 4A. 

      Thank you for pointing this out, these have been added.

      - Fig. 4A, the Ponceau staining for the detergent insoluble samples shows almost no signal for lane 7 and the data should hence not be analyzed. 

      The western blot membrane in Fig. 4A shows a reliable signal in all lanes (including lane 7) when probed with antibodies for ubiquitin, Ref(2)P, and tubulin. Therefore, there is no reason for excluding lane 7 from the analysis. Ponceau S staining is provided as an additional loading control but was not used to normalize the data.

      Reviewer #3 (Recommendations For The Authors): 

      There are a number of confusing or not sufficiently explained points in the figures that require clarity. 

      In Figure 1, panels B and C, one assumes the gray broad line across means no difference from control. For the genes, many have points that are scattered both above and below that control line. What do the dots and range represent for each gene, and why are the data so scattered. How do the authors explain data ranging from no effect, to a negative effect to a positive effect, all for the same gene? Akt1 and Hsp83 are controls but are not quantitated to appreciate how variable the assay is. Can they explain the figure better, and also why the data for any one gene are so variable?

      We have amended the legend of Figure 1 to indicate that each data point in the graph represents a single RNAi line targeting the corresponding gene. The mean of 5 biological replicates is shown for each RNAi line, with each biological replicate representing a single eye imaged from a distinct fly. Therefore, the data points that do not show large magnitude changes may indicate RNAi lines that were not effective at knocking down the target protein (or that did not affect HTT aggregates). Therefore, the variability in the analysis of a single gene arises because different RNAi lines targeting that gene may have different efficacy. RNAi lines for Akt1 and Hsp83 are merely used as controls (these have been quantified in Jiao et al. 2023; PMID: 36640359).

      In Figure 2A, it is not clear which animals have the hL-Q72-GFP (which eyes are "rough eyes"?). Also, do ubc6-RNAi and eff-RNAi have an impact on the normal eye? That is, can they explain the images and genotypes more clearly. 

      UBC6 and eff RNAi produce these rough eye phenotypes in the absence of HTT-polyQ and these are rescued by the expression of their human homologs. The panel images indicated in bold here below are those that have “rough eye” phenotypes: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 (a green R has been added to these panels in Fig. 2A).

      In Figure 2B, panel 3 looks very different from 1 and 4 and yet is not different from them by quantitation. Can they replace it with a more representative panel or is 3 lower (but not significantly so)? 

      Please note that some of the GFP fluorescence in image 4 is not punctate, but rather diffuse fluorescence that is not related to HTT-GFP aggregates. Our image quantitation methods utilized thresholding to identify GFP-positive puncta while eliminating background fluorescence not corresponding to HTT-GFP puncta.

      In Figures 3E and F, it would be helpful in F to put the detergent soluble bar graphs all on the left so that those data are on the left in both E and F, and then detergent-insoluble in E and F to the right. This would make the figure and quantitation easier to follow. 

      Done.

      The same point as above for Figures 4 A and B. 

      Done.

      In Figure 3A, CG7656 is nearly as reduced with age as eff. One wonders if that gene would give a different or similarly overlapping proteome with age as eff. Was CG7656 not focused on because not conserved? 

      As indicated in Figure 1B, CG7656 is orthologous to UBE2R1 (also called CDC34) and UBE2R2 in humans. In this screen, however, RNAi targeting CG7656 did not appear to influence HTT aggregates and therefore was not selected for further analyses. However, it may play a role in skeletal muscle proteostasis during aging.

      In Figure 6, the R2 value correlating age with eff-RNAi is weak. Although they discuss this in the text, it might also be helpful to include Venn diagrams for gene overlaps and the significance to make the argument more clear that there is a significant correlation in proteins up and down to indicate that eff largely recapitulates the changes of aging. Correlating this with proteins that are restored with UBE2D in muscle in a more clear manner may also be helpful for readers interested in aging. 

      We have amended the text to indicate that this relatively low correlation (R2\=~0.2, but corresponding to a consistent regulation of 70% of proteins by aging and effRNAi) could indicate that eff/UBE2D is only in part responsible for maintaining a youthful composition of the muscle proteome during aging. Other changes that occur with aging likely account for non-correlated alterations in protein levels. We have also added Venn diagrams (Fig. 6E) to further display the overlap in protein regulation by aging vs. effRNAi.

      In Figure 7, they might indicate that the accumulated insoluble protein is ubiquitinated. That is left out of the figure, although indicated in the legend. 

      Done.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Our revised version of the manuscript addresses all the comments and suggestions raised, as clarified in our point-by-point answer to the reviewers. We have performed additional experiments regarding the effects on proliferation and differentiation of additional cell types in the muscle, such as myogenic and mesenchymal progenitors as well as chondrogenesis in parental hMSCs that did not express exogenous ACVR1. Moreover, as suggested by reviewer #2, we performed all the chondrogenic experiments with addition of TGFβ in the differentiation media and analyzed chondrogenesis by both Alcian blue staining and qPCR analysis of gene markers (Sox9, Acan, Col2a1 and Mmp3). We also extended our RNA-seq analysis and included new data using both hMSCs expression wild type or R206H ACVR1 receptor, with or without different ACVR1 ligands (BMP6 and Activin A) and treated or not with the inhibitor BYL719. The new data suggests that BYL719 is able to inhibit the expression of genes involved in ossification and osteoblast differentiation irrespective of the presence of the mutation. We also discuss the effect of BYL719 in mTOR signaling and addressed all the minor comments suggested by both reviewers.

      We addressed the specific comments of the reviewers as follows:

      Reviewer # 1:

      Specific points:

      Point #1 and #2. The authors showed that BYL719 inhibited HO in FOP model mice. Did they have HO not only in the muscle but also in the bone marrow? The progenitor cells of chondrocytes and osteoblasts may differ between the muscle and bone marrow. The authors should examine the effects of BYL719 on some other types of cells in the muscle, such as myoblasts and fibro-adipogenic cells, in addition to the bone marrow-derived MSCs. Furthermore, it was unclear whether they were human or murine MSCs in the text.

      The inhibitory effect of BYL719 on HO in FOP mice was clear, but the molecular mechanisms or target cells were still unclear because BYL719 affected multiple types of cells and molecules. The authors are encouraged to show clearer mechanisms and target cells' critical inhibition of HO. Again, this reviewer believes that in vivo and in vitro experiments using muscle and bone marrow and cells prepared from them should provide additional critical information.

      As detailed in the introduction, it is known that Heterotopic Ossification develops in the skeletal muscle and connective tissues. Consistent with the current knowledge of the field, none of the mice showed HO in the bone marrow. Additionally, since activation of the mutant allele is achieved by injection of CRE-expressing adenovirus and cardiotoxin in the muscle hindlimb, it is unlikely that mesenchymal progenitors in the bone marrow would be strongly affected. Interestingly, single-cell RNA sequencing from multiple mouse tissues identified a very strong transcriptional similarity between FAPs and non-muscle mesenchymal progenitors (PMID: 37599828). As suggested, we examined the effects of BYL719 in proliferation and differentiation in additional cell types such as muscle progenitors. In this new version of the manuscript, we show that BYL719 reduces the proliferation of muscle and mesenchymal progenitors while it blocks myoblast differentiation in vitro (Figure 7, Figure Supplement 1). MSCs were murine on those experiments shown in Figure 3; whereas assays shown in Figures 5 and 6 were of human origin. We have further clarified this in the respective Figure legends.

      All the data generated strongly suggests that there is not a single mechanism supporting all the effects of BYL719 in HO. Instead, BYL719 affects multiple cell types involved in efficient HO (e.g. reduction in proliferation and osteochondrogenic specification of mesenchymal precursors (MPs), reduction on proliferation, migration, and inflammatory gene expression on monocytes, etc.). Interestingly, our data suggests that BYL719 is able to inhibit these effects on MPs and monocytes irrespective of the presence of the ACVR1-R206H mutation (Figures 5, 6 and 7). Additionally, there are several signaling mechanisms affected. BYL719 reduces SMAD1/5, p38, AKT and mTOR signaling in parental MPs or with mutations in ACVR1 (Figure 3 and our previous publication PMID: 31373426), being all these pathways required for efficient osteochondrogenic specification of MPs. We consider that the different detailed mechanisms by which BYL719 inhibits osteochondrogenic specification enhances the robustness of the findings in this study.

      Point #3. In FOP model mice, ACVR1 was mutated as Q207D. However, R206H was used in in vitro experiments. Do they have the same characteristics? This reviewer would like to recommend examining the effect of BYL719 on wild-type ACVR1, R206H, and Q207D simultaneously in each experiment.

      We already performed these experiments, assaying in parallel ACVR1-WT, ACVR1-Q207D and ACVR1-R206H, in the transcriptional responses of MPs in our previous work (PMID: 31373426). Both mutations had similar responses, being ACVR1-Q207D stronger than ACVR1-R206H, as it has been shown in vivo in mouse models of HO (PMID: 34633114). In any case, BYL719 inhibits these transcriptional responses induced by both mutant alleles.

      Point #4. Figure 5: What was the effect of BYL719 on the differentiation of parental cells that did not express exogenous ACVR1?

      We performed new assays of chondrogenic differentiation of hMSCs that are shown in the new Figure 5. BYL719 inhibits chondrogenic differentiation of parental hMSCs and also inhibits chondrogenic specification irrespective of the expression of either wild type or mutant ACVR1.

      Point #5. Figure 6: In this experiment, gene expression was examined in pretreated MSCs-ALK2 (ACVR1?) R206H with and without BYL719. It was clear whether suppression of gene expression by BYL719 was specifically caused in cells expressing R206H. What were the effects of BYL719 on parental cells that did not express exogenous ACVR1?

      To be consistent, we relabeled ALK2 to ACVR1 in the figure. We expanded the conditions analyzed in the RNA-sequencing. We included conditions where we activate ACVR1 (either WT or R206H) with their known physiological ligand BMP6. In both, human MSCs expressing ACVR1-R206H and human MSCs expressing Wild Type ACVR1, we observed a downregulation of differentially expressed genes upon addition of BYL719, irrespective of ligand (BMP6 or Activin A) or receptor (RH or WT) (added new Figure 6: B and C).

      Point #6. Figure 7: BYL719 suppressed cell proliferation of all cells examined partially at 2 uM and almost completely at 10 uM, respectively. There is a possibility that BYL719 inhibits HO by inhibiting osteochondroprogenitor proliferation. The authors are encouraged to show data on the effect of BYL719 on the proliferation of other types of cells, such as myoblasts, fibro-adipogenic cells, or bone marrow cells.

      We examined the effects of BYL719 in proliferation in additional cell types such as muscle and mesenchymal progenitors. BYL719 slightly reduced the proliferation of myoblasts and mesenchymal cells in vitro (Figure 7, Figure Supplement 1). However, the reduction in the proliferation in myoblasts or MPs did not reach the extent to that observed in monocytes or macrophages (Figure 7).

      Point #7. Figure 8: How was the effect of BYL719 on muscle regeneration in wild-type? It was reported that mTOR signaling is important in HO in FOP. The authors are encouraged to show the effect of BYL719 on mTOR signaling.

      Muscle regeneration in wild-type mice has also been shown in our previous results PMID: 31373426. In addition, we included images of the muscle regeneration after 23 days of treatment with BYL719 in mice ACVR1Q207D with or without PI3Kα deletion after induction of HO in the new Figure 2, Figure Supplement 2. These mice showed full muscle regeneration or small calcifications surrounded by muscle at most. The effects of PI3Kα inhibitors, either BYL719 or A66, on mTOR signaling had been previously shown by our group (PMID: 31373426). Both inhibitors strongly reduced signaling of mTOR, visualized by activation of p70 S6-kinase, a surrogate marker of mTOR activity.

      Minor points:

      (9) SMAD 1/5 should be SMAD1/5.

      (10) The source of human MSCs should be indicated in the text.

      (11) ALK2 should be ACVR1 in Figure 6A.

      (12) The protein levels of each receptor should be examined in Fig. 4.

      We introduced the suggested changes in the manuscript and Figure 6 and indicated the source of human MSCs in Materials and Methods. We also examined the levels of each receptor that are shown in the new Figure 4, Figure Supplement 1.

      Reviewer # 2:

      Specific points:

      Point #1. Because the involvement of PI3K in HO of FOP, was already reported by authors' group and also others (Hino et al, Clin Invest, 2017), the main purpose of this study was to disclose the mechanism of how PI3K was activated in FOP cells. In the published study (Hino et al, Clin Invest, 2017), PI3K was activated by the ENPP2-LPA-LPR cascade. Unfortunately, there were no new data for this important issue.

      The main purpose of this study is to demonstrate that the pharmacological and genetic inhibition of PI3Kα in HO progenitors at injury sites reduces HO in vivo, to extend the insights into the molecular and cellular mechanisms responsible for the therapeutic effect of PI3K inhibition, and to optimize the timing of the administration of BYL719. Class I PI3Ks are heterodimers of a p110 catalytic subunit in complex with a regulatory subunit. They engage in signaling downstream of tyrosine kinases, G protein-coupled receptors and monomeric small GTPases. Therefore, a plethora of growth factors, cytokines, inflammatory agents, hormones and additional external and internal stimuli are able to activate PI3Kα (PMID: 31110302). In fact, TGF-β family members, including activin A, are able to activate PI3K and mediate some of their non-canonical responses (PMID: 19114990). Multiple factors with known increased expression in the ossifying niche in HO and FOP (e.g. activin A, TGF-β, inflammatory agents such as TNFα, IL6, IL3, etc.) are known activators of PI3K (PMID: 30429363). Interestingly, in our RNA-seq analysis in hMSCs we did not observe increased expression levels of Enpp2 when comparing wild type and R206H mutated cells treated with activin A.

      Point #2. The HO formation of ACVR1/Q207D model mice in this study is extremely unstable (Figure 1B, DMSO). Even the bone volume of some red symbols, which indicate the presence of HO, is located on the base (0.00) line. I would examine carefully the credibility of the data. Also, it is well known that the molecular behavior of mice Acvr1/Q207D and human ACVR1/R206H was different.

      We agree with the reviewer that induction of HO is variable between mice showing variations in penetrance and intensity of the ossifying lesions. This variability is a known common trend that appears in all the models of HO published so far (e.g. PMID: 28758906, PMID: 26333933). Accordingly, we did not exclude any animal that has been injected with CRE-expressing adenovirus plus cardiotoxin in the μCT analysis. Regarding the behavior of mice Acvr1/Q207D and human ACVR1/R206H, it is well known that Q207D produces more robust and stronger responses in terms of signaling and formation of heterotopic ossification (PMID: 34633114). Therefore, reduction of HO by BYL719 would be more stringent in the Acvr1/Q207D model.

      Point #3. The experimental design of Figure 5 experiments is confusing. Although the authors mentioned that the data in Figure 5A were taken seven days after chondrogenic induction, I am skeptical whether the chondrogenic induction was successful. Based on the description of Material and Methods, the authors did not include TGFβ in their "Differentiation Medium", which is an essential growth factor to induce chondrogenic differentiation of human MSC. Why did the ALP activity increase after chondrogenic induction? The authors should demonstrate the evidence of successful chondrogenic induction by showing the expression of key chondrogenic genes such as SOX9, ACAN, or COL2A1. The data in Figure 5B-E are also confusing. The addition of Activin A showed no difference between ACVR1/WT and ACVR1/R206H cells, suggesting that these cells did not reproduce the situation of FOP.

      We performed new assays of chondrogenic differentiation of hMSCs that are shown in the new Figure 5. We included TGFβ1 in the differentiation medium and also included the parental cell line in the analysis. In addition of being a marker of osteoblast differentiation, alkaline phosphatase (ALPL) has also been shown to be induced during chondroblast differentiation in vitro (PMID: 19855136; PMID: 9457080; PMID: 18377198; PMID: 23388029). Moreover, expression data of SOX9, COL2A1, ACAN and MMP13 of cells after chondrogenic differentiation is included in the new Figure 5. Expression of some markers (e.g. ACAN) are increased by the expression of ACVR1R206H, however, we did not observe significant differences in chondroblast differentiation gene expression between ACVR1wt and ACVR1R206H expressing cells. In any case, BYL719 could inhibit chondrogenic differentiation of parental hMSCs and also the chondrogenic specification irrespective of the expression of either wild type or mutant ACVR1.

      Point #4. The experimental design and data analyses of RNA-seq were inappropriate and insufficient, which is disappointing for the reviewer because this will be a key experiment in this study. Because the most important point is to identify the signal for PI3Kα induced by Activin A via ACVR1/R206H, they should also use hMSC-ACVR1/WT for this experiment. Because the authors clearly demonstrated that TGFBR were not targets of BYL719, they should compare the expression profiles between MSC-ACVR1/WT and MSC-ACVR1/WT with BYL719 to identify the targets of BYL719 unrelated to Activin A signal. Then the expression profiles of ACVR1/R206H cells treated with Activin A and Activin A plus BYL719 were compared. Among down-regulated signals by BYL719, those found also in MSC-ACVR1/WT should be discarded. It is important to investigate whether the GO term of ossification or osteoblast differentiation is found also in MSC-ACVR1/WT. If it is so, the effect of BYL719 is not specific for FOP cells.

      We extended our RNA sequencing analysis with additional experimental conditions and comparisons. In new Figure 6, we now compare hMSCs expressing wild type or R206H receptors, with or without BYL719 inhibition, and with or without different ligand activations (BMP6 or Activin A) (New Figure 6A). New Figure 6B shows the Gene ontology analysis of the differentially expressed genes between cells expressing WT and RH receptors under control conditions. We can observe that ossification (GO:0001503) and osteoblast differentiation (GO:0001649) were detected within the top 10 significantly differentially regulated biological processes between these conditions. Therefore, we analyzed these relevant identified GO terms in 5 different comparisons upon GO enrichment analysis (Figure 6C). In addition to the comparison between cells expressing WT and RH receptors under control conditions explained above, we also compared cells expressing WT or RH receptor, with different ACVR1 ligands (BMP6 and Activin A), and with or without BYL719 inhibitor. The addition of BYL719 resulted in a downregulation of the GO terms “ossification” and “osteoblast differentiation” (new Figure 6C). These results confirm the inhibitory effect of BYL719 on ossification and osteoblast differentiation biological processes, and inform that this inhibitory effect remains consistent upon BMP6 or Activin A ligand activation, and with ACVR1 WT and RH expression.

      Point #5. The data in Figure 7 were not related to the aim of this study because cell lines used in these experiments did not have ACVR1/R206H mutations. It is not appropriate to extrapolate these data in the FOP situation.

      We utilized immune cell lines where we could activate ACVR1 with their known physiological ligand BMP6. Mutated ACVR1 gains response to activin A in addition to maintaining the physiological response to BMP6 as the wild type form. Therefore, in these assays we interrogated in vitro, with addition of BMP6, the effects of BYL719 in the growth, migration and inflammatory gene expression upon conditions of activated ACVR1 receptor downstream signaling. We consider that understanding the effects of PI3Kα inhibition in the regulation of proliferation, migration and inflammatory cytokine expression in monocytes, macrophages and mast cells is essential to better define the potential outcome of BYL719 treatment for heterotopic ossifications.

      Minor comments:

      (1) The legends for Figure 1C were those for Figure 1D, and there were no descriptions for Figure 1C in the legends and methods section. The reviewer was unable to understand the meaning of BV/TV. What is TV?

      (2) “However, in PI3Kα deficient mice ACVR1Q207D expression only led to minor ectopic calcifications that were already surrounded by fully regenerated muscle tissue on the 23rd day after injury (Figure 2D, Figure 2-Figure Supplement 1B)": There were no histological data either Figure 2D, Figure 2-Figure Supplement 1B), which showed muscle tissues.

      (3) "The overexpression of Acvr1R206H increased basal and activin dependent expression of canonical (Id1 and Sp7) and non-canonical (Ptgs2) BMP target genes (Figure 3C),": There was no increase of Ptgs2 gene in basal level.

      (4) Materials and Methods. Production of human fetal mesenchymal stem cells expressing ACVR1.: Is it derived from a fetus?

      (5) Figure 6C: There was no description of the meaning of each column. What does AA mean and what is the number?

      We introduced the missing information in the manuscript, Figure legends and material and methods section for points #1, 4 and 5. AA was Activin A, the number was the number of replicates. This has been detailed in the figure legend. We included images of the muscle regeneration after 23 days of treatment with BYL719 in mice after induction of HO in the new Figure 2, Figure Supplement 2 (point #2). We corrected the mistake in the manuscript refraining for suggesting increase of Ptgs2 gene expression by ACVR1-R206 at the basal level (Point #3).

    1. Author response:

      Reviewer #1 (Public Review):

      Weaknesses:

      There are some minor weaknesses.

      Notably, there are not a lot of new insights coming from this paper. The structural comparisons between MCC and PCC have already been described in the literature and there were not a lot of significant changes (outside of the exo- to endo- transition) in the presence vs. absence of substrate analogues.

      We agree that the structures of the human MCC and PCC holoenzymes are similar to their bacterial homologs. That is due to the conserved sequences and functions of MCC and PCC across different species.

      There is not a great deal of depth of analysis in the discussion. For example, no new insights were gained with respect to the factors contributing to substrate selectivity (the factors contributing to selectivity for propionyl-CoA vs. acetyl-CoA in PCC). The authors state that the longer acyl group in propionyl-CoA may mediate stronger hydrophobic interactions that stabilize the alpha carbon of the acyl group at the proper position. This is not a particularly deep analysis and doesn't really require a cryo-EM structure to invoke. The authors did not take the opportunity to describe the specific interactions that may be responsible for the stronger hydrophobic interaction nor do they offer any plausible explanation for how these might account for an astounding difference in the selectivity for propionyl-CoA vs. acetyl-CoA. This suggests, perhaps, that these structures do not yet fully capture the proper conformational states.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We will discuss this limitation in our revised manuscript.

      The authors also need to be careful with their over-interpretation of structure to invoke mechanisms of conformational change. A snapshot of the starting state (apo) and final state (ligand-bound) is insufficient to conclude *how* the enzyme transitioned between conformational states. I am constantly frustrated by structural reports in the biotin-dependent enzymes that invoke "induced conformational changes" with absolutely no experimental evidence to support such statements. Conformational changes that accompany ligand binding may occur through an induced conformational change or through conformational selection and structural snapshots of the starting point and the end point cannot offer any valid insight into which of these mechanisms is at play.

      Point accepted. We will revise our manuscript to use "conformational differences" instead of "conformational changes" to describe the differences between the apo and ligand-bound states.

      Reviewer #2 (Public Review):

      Comments and questions to the manuscripts:

      I'm quite impressed with the protein purification and structure determination, but I think some functional characterization of the purified proteins should be included in the manuscript. The activity of enzymes should be the foundation of all structures and other speculations based on structures.

      We appreciate this comment. However, since we purified the endogenous BDCs and the sample we obtained was a mixture of four BDCs, the enzymatic activity of this mixture cannot accurately reflect the catalytic activity of PCC or MCC holoenzyme. We will acknowledge this limitation in the discussion section of our revised manuscript.

      In Figure 1B, the structure of MCC is shown as two layers of beta units and two layers of alpha units, while there is only one layer of alpha units resolved in the density maps. I suggest the authors show the structures resolved based on the density maps and show the complete structure with the docked layer in the supplementary figure.

      We appreciate this comment. We have shown the cryo-EM maps of the PCC and MCC holoenzymes in fig. S8 to indicate the unresolved regions in these structures. The BC domains in one layer of MCCα in the MCC-apo structure were not resolved. However, we think it would be better to show a complete structure in Fig. 1 to provide an overall view of the MCC holoenzyme. We will revise Fig. 1B and the figure legend to clearly point out which domains were not resolved in the cryo-EM map and were built in the structure through docking.

      In the introduction, I suggest the author provide more information about the previous studies about the structure and reaction mechanisms of BDCs, what is the knowledge gap, and what problem you will resolve with a higher resolution structure. For example, you mentioned in line 52 that G437 and A438 are catalytic residues, are these residues reported as catalytic residues or this is based on your structures? Has the catalytic mechanism been reported before? Has the role of biotin in catalytic reactions revealed in previous studies?

      Point accepted. It was reported that G419 and A420 in S. coelicolor PCC, corresponding to G437 and A438 in human PCC, were the catalytic residues (PMID: 15518551). The same study also reported the catalytic mechanism of the carboxyl transfer reaction. The role of biotin in the BDC-catalyzed carboxylation reactions has been extensively studied (PMIDs: 22869039, 28683917). We will include these information in the introduction section of our revised manuscript.

      In the discussion, the authors indicate that the movement of biotin could be related to the recognition of acyl-CoA in BDCs, however, they didn't observe a change in the propionyl-CoA bound MCC structure, which is contradictory to their speculation. What could be the explanation for the exception in the MCC structure?

      We appreciate this comment. We do not have a good explanation for why we did not observe a change in the propionyl-CoA bound MCC structure. It is noteworthy that neither acetyl-CoA nor propionyl-CoA is the natural substrate of MCC. Recently, a cryo-EM structure of the human MCC holoenzyme in complex with its natural substrate, 3-methylcrotonyl-CoA, has been resolved (PDB code: 8J4Z). In this structure, the binding site of biotin and the conformation of the CT domain closely resemble that in our acetyl-CoA-bound MCC structure. Therefore, the movement of biotin induced by acetyl-CoA binding mimics that induced by the binding of MCC's natural substrate, 3-methylcrotonyl-CoA, indicating that in comparison with propionylCoA, acetyl-CoA is closer to 3-methylcrotonyl-CoA regarding its ability to bind to MCC. We will discuss this possibility in our revised manuscript.

      In the discussion, the authors indicate that the selectivity of PCC to different acyl-CoA is determined by the recognition of the acyl chain. However, there are no figures or descriptions about the recognition of the acyl chain by PCC and MCC. It will be more informative if they can show more details about substrate recognition in Figures 3 and 4.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We will discuss this limitation in our revised manuscript.

      How are the solved structures compared with the latest Alphafold3 prediction?

      Since AlphaFold3 was not released when our manuscript was submitted, we did not compare the solved structures with the AlphaFold3 predictions. We have now carried out the predictions using Alphafold3. Due to the token limitation of the AlphaFold3 server, we can only include two α and six β subunits of human PCC or MCC in the prediction. The overall assembly patterns of the Alphafold3-predicted structures are similar to that of the cryo-EM structures. The RMSDs between PCCα, PCCβ, MCCα, and MCCβ in the apo cryo-EM structures and those in the AlphaFold3-predicted structures are 7.490 Å, 0.857 Å, 7.869 Å, and 1.845 Å, respectively. The PCCα and MCCα subunits adopt an open conformation in the cryo-EM structures but adopt a closed conformation in the AlphaFold-3 predicted structures, resulting in large RMSDs.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study presents an important contribution to cardiac arrhythmia research by demonstrating long noncoding RNA Dachshund homolog 1 (lncDACH1) tunes sodium channel functional expression and affects cardiac action potential conduction and rhythms. The evidence supporting the major claims are solid. The work will be of broad interest to cell biologists and cardiac electrophysiologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors show that a long-non coding RNA lncDACH1 inhibits sodium currents in cardiomyocytes by binding to and altering the localization of dystrophin. The authors use a number of methodologies to demonstrate that lncDACH1 binds to dystrophin and disrupt its localization to the membrane, which in turn downregulates NaV1.5 currents. Knockdown of lncDACH1 upregulates NaV1.5 currents. Furthermore, in heart failure, lncDACH1 is shown to be upregulated which suggests that this mechanism may have pathophysiological relevance.

      Strengths:

      (1) This study presents a novel mechanism of Na channel regulation which may be pathophysiologically important.

      (2) The experiments are comprehensive and systematically evaluate the physiological importance of lncDACH1.

      Reviewer #2 (Public Review):

      This manuscript by Xue et al. describes the effects of a long noncoding RNA, lncDACH1, on the localization of Nav channel expression, the magnitude of INa, and arrhythmia susceptibility in the mouse heart. Because lncDACH1 was previously reported to bind and disrupt membrane expression of dystrophin, which in turn is required for proper Nav1.5 localization, much of the findings are inferred through the lens of dystrophin alterations.

      The results report that cardiomyocyte-specific transgenic overexpression of lncDACH1 reduces INa in isolated cardiomyocytes; measurements in whole heart show a corresponding reduction in conduction velocity and enhanced susceptibility to arrhythmia. The effect on INa was confirmed in isolated WT mouse cardiomyocytes infected with a lncDACH1 adenoviral construct. Importantly, reducing lncDACH1 expression via either a cardiomyocyte-specific knockout or using shRNA had the opposite effect: INa was increased in isolated cells, as was conduction velocity in heart. Experiments were also conducted with a fragment of lnDACH1 identified by its conservation with other mammalian species. Overexpression of this fragment resulted in reduced INa and greater proarrhythmic behavior. Alteration of expression was confirmed by qPCR.

      The mechanism by which lnDACH1 exerts its effects on INa was explored by measuring protein levels from cell fractions and immunofluorescence localization in cells. In general, overexpression was reported to reduce Nav1.5 and dystrophin levels and knockout or knockdown increased them.

      The strengths of this manuscript include convincing evidence of a link between lncDACH1 and Na channel function. The identification of a lncDACH1 segment conserved among mammalian species is compelling. The observation that lncDACH1 is increased in a heart failure model and provides a plausible hypothesis for disease mechanism.

      One limitation of the fractionation approach is the uncertain disposition of Na channel protein deemed "cytoplasmic." It seems likely that the membrane fraction includes ER membrane. The signal may reasonably be attributed to Na channel protein in stalled transport vesicles, or alternatively in stress granules, but this was not directly addressed.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors report the first evidence of Nav1.5 regulation by a long noncoding RNA, LncRNA-DACH1, and suggest its implication in the reduction in sodium current observed in heart failure. Since no direct interaction is observed between Nav1.5 and the LncRNA, they propose that the regulation is via dystrophin and targeting of Nav1.5 to the plasma membrane.

      Strengths:

      (1) First evidence of Nav1.5 regulation by a long noncoding RNA.

      (2) Implication of LncRNA-DACH1 in heart failure and mechanisms of arrhythmias.

      (3) Demonstration of LncRNA-DACH1 binding to dystrophin.

      (4) Potential rescuing of dystrophin and Nav1.5 strategy.

      Weaknesses:

      (1) The fact that the total Nav1.5 protein is reduced by 50% which is similar to the reduction in the membrane reduction questions the main conclusion of the authors implicating dystrophin in the reduced Nav1.5 targeting. The reduction in membrane Nav1.5 could simply be due to the reduction in total protein.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Weaknesses:

      (1) What is indicated by the cytoplasmic level of NaV1.5, a transmembrane protein?

      This is still confusing. Since Nav1.5 is an integral membrane protein, I am not sure what is really meant here by cytosolic fraction. From the workflow, it seems a separate organelle fraction is also collected. Is the amount of Nav1.5 in this fraction (which I assume includes for e.g. lysosome) also increased with lncDACH1? I recommend the authors to refer to the Nav channels not at the plasma membrane as 'intracellular' rather than 'cytoplasmic'.

      Thanks for the insightful comment. We completely agree. Accordingly, we have changed “cytoplasmic” to “ intracellular“.

      Line 226. "In consistent with the results" Perhaps unnecessary to have "in"

      Thank you for the insightful comment. We have corrected it.

      Line 228. Is it optimal or optical?

      Sorry for the mistake, it should be optical. We have corrected it.

      Reviewer #3 (Recommendations For The Authors):

      I still have an issue with the total reduction in Nav1.5 which is about the same as the reduction in membrane and currents. The authors argue that there is an increase in cytoplasmic Nav1.5. However the controls that they provide for membrane and cytoplasmic fractions are not convincing.

      Thank you for the insightful comment. We can not rule out the possibility that the reduction in membrane Nav1.5 maybe be due to the reduction in total protein. Our data indicates that the membrane and total protein levels of Nav1.5 were reduced by 50%. However, the intracellular Nav1.5 was not decreased, but increased in the hearts of lncDACH1-TG mice than WT controls, which indicates that the intracellular Nav1.5 failed to traffic to the membrane.

    1. Author response:

      First we thank the reviewers for a thorough reading of our paper and some useful comments. A recurrent remark of the reviewers concerns the appearance of kRas-expressing cells (labelled by a nuclear blue fluorescent marker) which we attribute to the progeny of the initially induced cell. The reviewers suggest that these cells may have been obtained through activation of the Cre-recombinase in other cells by cyclofen released from light scattering, via diffusion, leakiness, etc. These remarks are perfectly reasonable from people not familiar with the cyclofen uncaging approach that we are using but are unwarranted as we shall show below.

      We have been using cyclofen uncaging with subsequent activation of a Cre-recombinase (or some other proteins) since 2010 (see ref.34, Sinha et al., Zebrafish 7, 199-204 (2010) and our 2018 review (ref.35, Zhang et al., ChemBioChem 19,1-8 (2018)). In our experiments, the embryos are incubated in the dark in 6M caged cyclofen (cCyc) and washed in E3 medium (or transferred to a new medium with no cCyc). In these conditions, over many years we never observed activation of the recombinase, i.e. the appearance of the associated fluorescent label in cells of embryos grown in E3 medium. Hence leakiness can be ruled out (in presence of cCyc or in its absence).

      Following transfer of the embryos to new E3 medium we illuminate the embryos locally with light at 405nm. In these conditions, cCyc is only partially uncaged and results in activation of Cre-recombinase in only a few cells (1,2, 3, …) within the illuminated region only, namely in the appearance of the kRas-associated nuclear blue fluorescent label in usually one cell (and sometimes in a few more; data and statistics will be incorporated in a revised manuscript). In absence of any further treatment (e.g. activation of a reprogramming factor) these fluorescently labelled cells disappear within a few days (either via shut-down of their promotor, apoptosis or some other mechanism). The crucial point here is that we see less and not more kRas expressing cells (i.e. with nuclear blue fluorescence). This observation rules out activation of Cre-recombinase in other cells days after illumination due to leakiness, cyclofen released by light or diffusing from the illumination spot.

      To observe many more fluorescent cells days after activation of the initial cell, one needs to transiently activate VentX-GR by overnight incubation in dexamethasone (DEX) (Injecting the embryos at 1-cell stage with VentX-GR or incubating them in DEX does not result in the appearance of more blue fluorescent cells). Following activation of VentX-GR, the fluorescent cells observed a couple of days after initiation are visualized in E3 medium (i.e. in absence of cyclofen) and are localized to the vicinity of the otic vesicle (the region where the initial cell was activated). In a revised manuscript we will present images of these fluorescent cells taken a few days apart from the same embryo in which a single cell was initially activated. Hence, we attribute these cells to the progeny of the activated cell. Obviously, single cell tracking via time-lapse microscopy would nail down this issue and provide fascinating insight into the initial stages of tumor growth. Unfortunately, immobilization of embryos in the usual medium (e.g. MS222, tricaine) over 5-6 days to track the division and motion of single cells is not possible. We are considering some other possibilities (immobilization in bungarotoxin or via photo-activation of anionic channels), but these challenging experiments are for a future paper.

      Reviewer #1 (Public Review):

      The authors then performed allotransplantations of allegedly single fluorescent TICs in recipient larvae and found a large number of fluorescent cells in distant locations, claiming that these cells have all originated from the single transplanted TIC and migrated away. The number of fluorescent cells showed in the recipient larve just after two days is not compatible with a normal cell cycle length and more likely represents the progeny of more than one transplanted cell.

      As mentioned in the manuscript, we measure the density of cells/nl and inject in the yolk of 2dpf Nacre embryos a volume containing about 1 cell, following published protocols (S.Nicoli and M.Presta, Nat.Prot. 2,2918 (2007)). We further image the injected cell(s) by fluorescence microscopy immediately following injection, as shown in Fig.4A and Fig.S8B. We might miss a few cells but not many. With a typical cell cycle of ~10h the images of tumors in larvae at 3dpt (and not 2dpt as misunderstood by this reviewer) correspond to ~100 cells. In any case the purpose of this experiment was not to study tumorigenesis upon transplantation but to show that the progeny of the initially induced cells is capable of developing into a tumor in a naïve fish, which is the operational definition of cancer that we adopted here.

      The ability to migrate from the injection site should be documented by time-lapse microscopy.

      As stated above our purpose here is not to study tumor formation from transplanted cell(s) but to use that assay as an operational test of cancer. Besides as mentioned earlier single cell tracking in larvae over 3-4dpt is not a trivial task.

      Then, the authors conclude that "By allowing for specific and reproducible single cell malignant transformation in vivo, their optogenetic approach opens the way for a quantitative study of the initial stages of cancer at the single cell level". However, the evidence for these claims are weak and further characterization should be performed to:

      (1) show that they are actually activating the oncogene in a single cell (the magnification is too low and it is difficult to distinguish a single nucleus, labelling of the cell membrane may help to demonstrate that they are effectively activating the oncogene in, or transplanting, a single cell)

      In a revised manuscript we will provide larger magnification of the initial induced cell and show examples of oncogene activation in more than one cell.

      (2) the expression of the genes used as markers of tumorigenesis is performed in whole larvae, with only a few transformed cells in them. Changes should be confirmed in FACS sorted fluorescent cells

      When the oncogene is activated in a whole larvae all cells are fluorescent and thus FACS is of no use for cell sorting. Sorting could be done in larvae where single cells are activated, but then the efficiency of FACS is not good enough to isolate the few fluorescent cells among the many more non-fluorescent ones. We agree that the change in expression of the genes used as markers of tumorigenesis is an underestimate of their true change, but our goal at this time is not to precisely measure the change in expression level, but to show that the pattern of change is different from the controls and corresponds to what is expected in tumorigenesis.

      (3) the histology of the so called "tumor masses" is not showing malignant transformation, but at the most just hyperplasia.

      The histology of the hyperplasic tissues displays cellular proliferation with a higher density of nuclear material which is characteristic of tumors, Fig.S4C. Besides the increased expression of pERK in these tissues, Fig.S4A,B is also a hallmark of cancer.

      In the brain, the sections are not perfectly symmetrical and the increase of cellularity on one side of the optic tectum is compatible with this asymmetry.

      The expected T-shape formed by the sections of the tegmentum and hypothalamus are compatible with the symmetric sections shown in Fg.2D. The asymmetry in the optic tectum is a result of the hyperplasic growth.

      (4) The number of fluorescent cells found dispersed in the larvae transplanted with one single TIC after 48 hours will require a very fast cell cycle to generate over 50 cells. Do we have an idea of the cell cycle features of the transplanted TICs?

      As answered above, the transplanted larvae are shown at 3dpt (and not 2dpt as misunderstood by this reviewer). With a cell cycle of about 10h, a single cell can give rise to about 100 cells in that time lapse.

      Reviewer #2 (Public Review):

      Summary:

      This paper describes a genetically tractable and modifiable system …which could be used to study an array of combinations and temporal relationships of these cancer drivers/modifiers.

      We thank this referee for its positive comments. We would also like to point out that our approach provides for the first quantitative means to estimate the probability of tumorigenesis from a single cell, an estimate which is crucial in any assessment of cancer malignancy and the effectiveness of prophylactics.

      Weaknesses:

      There is minimal quantitation of … the efficiency of activation of the Ras-TFP fusion (Fig 1) in, purportedly, a single cell. …, such information seems essential.

      In a revised manuscript we will add more images of induction of a single (or a few cells) and a table where the efficiency of RAS activation is detailed.

      The authors indicate that a single cell is "initiated" (Fig 2) using the laser optogenetic technique, but without definitive genetic lineage tracing, it is not possible to conclude that cells expressing TFP distant from the target site near the ear are daughter cells of the claimed single "initiated" cell. A plausible alternative explanation is 1) that the optogenetic targeting is more diffuse (i.e. some of the light of the appropriate wavelength hits other cells nearby due to reflection/diffraction), so these adjacent cells are additional independent "initiated" cells or 2) that the uncaged tamoxifen analogue can diffuse to nearby cells and allow for CreER activation and recombination.

      We have addressed this point in our general comments to the reviewers’ remarks. The possibilities mentioned by this reviewer would result in cells expressing TFP in absence of VentX activation, which is not the case. Cells expressing TFP away from the initial site are observed days after activation of the oncogene (and TFP) in a single cell and only upon activation of VentX.

      In Fig 2B, the claim is made that "the activated cell has divided, giving rise to two cells" - unless continuously imaged or genetically traced, this is unproven.

      We have addressed this remark previously. Tracking of larvae over many days is not possible with the usual protocol using tricaine to immobilize the larvae. Nonetheless, in a revised version we will present images of an embryo imaged at various times post activation where proliferation of the cells can be observed. We are pursuing other alternatives for time-lapse microscopy over many days since, besides convincing the sceptics, a single cell tracking experiment (possibly coupled with in-situ spatial transcriptomics) will shed a new and fascinating light on the initial stages of tumor growth.

      In addition, it appears that Figures S3 and S4 are showing that hyperplasia can arise in many different tissues (including intestine, pancreas, and liver, S4C) with broad Ras + Ventx activation …. This should be clarified in the manuscript).

      This is true and will be clarified in the new version.

      In Fig S7 where single cell activation and potential metastasis is discussed, similar gut tissues have TFP+ cells that are called metastatic, but this seems consistent with the possibility that multiple independent sites of initiation are occurring even when focal activation is attempted.

      As mentioned previously this is ruled out by the fact that these cells are observed days after cyclofen uncaging (and TFP activation) and if and only if VentX is activated.

      Although the hyperplastic cells are transplantable (Fig 4), the use of the term "cells of origin of cancer" or metastatic cells should be viewed with care in the experiments showing TFP+ cells (Fig 1, 2, 3) in embryos with targeted activation for the reasons noted above.

      The purpose of this transplantation experiment was to show that cell in which both kRas and VentX have been activated possess the capacity to metastasize and develop a tumor mass when transplanted in a naïve zebrafish. This - to the best of our knowledge - is the operational definition of a malignant tumor.

      Reviewer #3 (Public Review):

      Summary:

      This study employs an optogenetics approach … to examine tumourigenesis probabilities under altered tissue environments.

      We thank this reviewer for this remark, since we believe that the opportunity to assess the probability of tumorigenesis from a single cell is possibly the most significant contribution of this work. To the best of our knowledge this has never been done before.

      Weaknesses:

      Lack of Methodological Clarity: The manuscript lacks detailed descriptions of methodologies,

      In a revised manuscript we will include additional detail of our methodology.

      Sub-optimal Data Presentation and Quality:

      Lack of quantitative data and control condition data obtained from images of higher magnification limits the ability to robustly support the conclusions.

      In a revised version we will include more images at higher magnification and quantitative data to support the main report of targeted single cell induction.

      Here are some details:

      Authors might want to provide more evidence to support their claim on the single cell KRAS activation.

      More images and a data on activation of single or few cells in the illumination field will be provided in a revised version.

      · Stability of cCYC: The manuscript does not provide information on the half-life and stability of cCYC. Understanding these properties is crucial for evaluating the system's reliability and the likelihood of leakiness, which could significantly influence the study's outcomes.

      We have been using the cCyc system for about 14 years. We refer the reader to our previous papers and reviews on this methodology (e.g. ref. 34,35). Briefly, cCyc is stable when not illuminated with light around 375nm. Typically, we incubate our embryos in the dark for about 1h before transferring them into E3 medium and illuminating them. Assessing the leakiness of the system is easy as expression of the fluorescent marker is permanently turned on. We have observed none in the conditions of our experiment.

      · Metastatic Dissemination claim: However, the absence of a supportive cellular compartment within the fin-fold tissue makes the presence of mTFP-positive metastatic cells there particularly puzzling. This distribution raises concerns about the spatial specificity of the optogenetic activation protocol … The unexpected locations of these signals suggest potential ectopic activation of the KRAS oncogene,

      We have addressed this remark in the introduction and above. Specifically, metastatic and proliferative mTFP-positive cells are observed if and only if VentX is also activated concomitant with activation of kRAS in a single cell. No proliferative cells are observed in absence of VentX activation, or in presence of VentX or Dex alone, or if kRAS has not been activated by cyclofen uncaging.

      · Image Resolution Concerns: The cells depicted in Figure 3C β, which appear to be near the surface of the yolk sac and not within the digestive system as suggested in the MS, underscore the necessity for higher-resolution imaging. Without clearer images, it is challenging to ascertain the exact locations and states of these cells, thus complicating the assessment of experimental results.

      Better images will be provided in the revised version.

      · The cell transplantation experiment is lacking protocol details:

      Details will be provided in the revised version. We have followed regular protocols for transplantation: S.Nicoli and M.Presta, Nat.Prot. 2,2918 (2007).

      • If the cells are obtained from whole larvae with induced RAS + VX expression, it is notable and somewhat surprising that the larvae survived up to six days post-induction (6dpi) before cells were harvested for transplantation. This survival rate and the subsequent ability to obtain single cell suspensions raise questions about the heterogeneity of the RAS + VX expressing cells that transplanted.

      From Fig.S4D, about 50% of the embryos survive at 6dpi. Though an interesting question by itself we have not (yet) addressed the important issue of the heterogeneity of the outgrowth obtained from a single cell. Our purpose here was just to show that cells in which both kRAS and VentX have been activated possess the capacity to metastasize and develop a tumor mass when transplanted in a naïve zebrafish. This - to the best of our knowledge - is the operational definition of a malignant tumor.

      · Unclear Experimental Conditions in Figure S3B: …It is not specified whether the activation of KRAS was targeted to specific cells or involved whole-body exposure.

      This was whole body (global) illumination and will be specified in the revised version.

      · Contrasting Data in Figure S3C compared to literature: The graph in Figure S3C indicates that KRAS or KRAS + DEX induction did not result in any form of hyperplastic growth. The authors should provide detailed descriptions of the conditions under which the experiments were conducted in Figure S3B and clarifying the reasons for the discrepancies observed in Figure S3C are crucial. The authors should discuss potential reasons for the deviation from previous reports.

      This discrepancy will be discussed in the revised version. First the previous reports consider the development of tumors over a longer time-span (4-5 weeks) which we have not studied here. Second, the expression of the oncogene in these reports might be stronger than in ours. Third, the stochastic appearance of tumors in these reports suggest that some other mechanism (transient stress-induced reprogramming?) might have activated the oncogene in the initial cell.

      Further comments:

      Throughout the study, KRAS-activated cell expansion and metastasis are two key phenotypes discussed that Ventx is promoting. However, the authors did not perform any experiments to directly show that KRAS+ cells proliferate only in Ventx-activated conditions.

      Yes, we did. See Fig. S1 and compare with Fig.S3B, or Fig.S8A in comparison with Fig.2A,B.

      The authors also did not show any morphological features or time-lapse videos demonstrating that KRAS+ cells are motile, even though zebrafish is an excellent model for in vivo live imaging. This seems to be a missed opportunity for providing convincing evidence to support the authors' conclusions.

      Performing single cell time-lapse microscopy on larvae over many (4-5) days is not possible with the regular tricaine protocol for immobilization. We are definitely planning such experiments, but they will require some other protocol, perhaps using bungarotoxin or some optogenetic inhibitory channels. Nonetheless, in the revised version we will show images of the same embryos at various times post single cell induction displaying proliferation of cells.

      There were minimal experimental details provided for the qPCR data presented in the supplementary figures S5 and S6, therefore, it is hard to evaluate result obtained.

      More details will be given in the revised version.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Tutak et al use a combination of pulldowns, analyzed by mass spectrometry, reporter assays, and fluorescence experiments to decipher the mechanism of protein translation in fragile X-related diseases. The topic is interesting and important.

      Although a role for Rps26-deficient ribosomes in toxic protein translation is plausible based on already available data, the authors' data are not carefully controlled and thus do not support the conclusions of the paper.

      Strengths:

      The topic is interesting and important.

      Weaknesses:

      In particular, there is very little data to support the notion that Rps26-deficient ribosomes are even produced under the circumstances. And no data that indicate that they are involved in the RAN translation. Essential controls (for ribosome numbers) are lacking, no information is presented on the viability of the cells (Rps26 is an essential protein), and the differences in protein levels could well arise from block in protein synthesis, and cell division coupled to differential stability of the proteins.

      We agree that presented data could benefit from addition of suggested experiments. We will  address the ribosome content, global translation rate and cell viability upon RPS26 depletion. We are also planning to apply polysome profiling to determine if RPS26-depleted ribosomes are translationally active.

      Specific points:

      (1) Analysis of the mass spec data in Supplemental Table S3 indicates that for many of the proteins that are differentially enriched in one sample, a single peptide is identified. So the difference is between 1 peptide and 0. I don't understand how one can do a statistical analysis on that, or how it would give out anything of significance. I certainly do not think it is significant. This is exacerbated by the fact that the contaminants in the assay (keratins) are many, many-fold more abundant, and so are proteins that are known to be mitochondrial or nuclear, and therefore likely not actual targets (e.g. MCCC1, PC, NPM1; this includes many proteins "of significance" in Table S1, including Rrp1B, NAF1, Top1, TCEPB, DHX16, etc...).

      The data in Table S6/Figure 3A suffer from the same problem.

      Tables S3 and S6 show the mass spectrometry output data from MaxQuant analysis  without any flittering.  Certain identifications, i.e. those denoted as contaminants (such as keratins) were removed during statistical analysis in Perseus software. Regarding the data presented in Table S6 (SILAC data), we argue that these data are of very good quality. More than 2000 proteins were identified in a 125min gradient, with over 80% of proteins that were identified with at least 2 unique peptides. However, we acknowledge that the description of Tables S3 and S6 may lead to misunderstanding, thus we will clarify their explanation.

      I am not convinced that the mass spec data is reliable.

      (2) The mass-spec data however claims to identify Rps26 as a factor binding the toxic RNA specifically. The rest of the paper seeks to develop a story of how Rps26-deficient ribosomes play a role in the translation of this RNA. I do not consider that this makes sense.

      Indeed, we identified RPS26 as a protein co-precipitated with FMR1 RNA containing expanded CGG repeats. However, we do not claim that they interact directly. Downregulation of FMRpolyG biosynthesis could be an outcome of the alteration of ribosomal assembly, changes in efficiency and fidelity of PIC scanning or impeded elongation or more likely combination of some of these processes. We will  provide better explanation regarding those issues in the revised version of the manuscript.

      (3) Rps26 is an essential gene, I am sure the same is true for DHX15. What happens to cell viability? Protein synthesis? The yeast experiments were carefully carried out under experiments where Rps26 was reduced, not fully depleted to give small growth defects.

      We agree with the Reviewer 1 that RPS26 is an essential protein. Previously, it was shown that cell viability in cells with mutated C-terminal deletion of RPS26 is decreased (Havkin-Solomon T, Nucleic Acids Res 2023). We will address the question regarding the suppression of FMRpolyG in models with partial RPS26 knock-down.

      (4) Knockdown efficiency for all tested genes must be shown to evaluate knockdown efficiency.

      Missing experiments showing efficiency of knock-down will be included in the revised version of the manuscript.

      (5) The data in Figure 1E have just one mock control, but two cell types (control si and Rps26 depletion).

      We will clarify this ambiguity in the revised version of the manuscripts.

      (6) The authors' data indicate that the effects are not specific to Rps26 but indeed also observed upon Rps25 knockdown. This suggests strongly that the effects are from reduced ribosome content or blocked protein synthesis. Additional controls should deplete a core RP to ascertain this conclusion.

      We agree that observed effect may stem partially from reduced ribosome content, however, we argue that this is not the only explanation. In the publication concerning RPS25 regulation of G4C2-related RAN translation (Yamada SB, 2019, Nat Neurosci), it was shown that RPS25 KO does not affect global translation. Our experiments (SUnSET assay, unpublished) indicated that RPS26 KD also did not reduce global translation rate significantly. We will present that data in the revised version of the manuscript.

      (7) Supplemental Figure S3 demonstrates that the depletion of S26 does not affect the selection of the start codon context. Any other claim must be deleted. All the 5'-UTR logos are essentially identical, indicating that "picking" happens by abundance (background).

      Results shown in Fig.S3 does not imply that RPS26 does not affect the selection of start codon context entirely. We just tested a few hypotheses. We decided to test -4 position, because this position was indicated as the most sensitive to RPS26 regulation in yeast (Ferretti M, 2017, Nat Struct Mol Biol). Regarding WebLOGO analysis; we wrote in the manuscript that we did not identify any specific motif or enrichment within analysed transcripts in comparison to background. We will clarify this ambiguity in revised version of the manuscript.

      (8) Mechanism is lacking entirely. There are many ways in which ribosomes could have mRNA-specific effects. The authors tried to find an effect from the Kozak sequence, unsuccessfully (however, they also did not do the experiment correctly, as they failed to recognize that the Kozak sequence differs between yeast, where it is A-rich, and mammalian cells, where it is GGCGCC). Collisions could be another mechanism.

      As in (7).

      Reviewer #2 (Public Review):

      Summary:

      Translation of CGG repeats leads to the accumulation of poly G, which is associated with neurological disorders. This is a valuable paper in which the authors sought out proteins that modulate RAN translation. They determined which proteins in Hela cells bound to CGG repeats and affected levels of polyG encoded in the 5'UTR of the FMR1 mRNA. They then showed that siRNA depletion of ribosomal protein RPS26 results in less production of FMR1polyG than in control. There are data supporting the claim that RPS26 depletion modulates RAN translation in this RNA, although for some results, the Western results are not strong. The data to support increased aggregation by polyG expression upon S26 KD are incomplete.

      Strengths:

      The authors have proteomics data that show the enrichment of a set of proteins on FMR1 RNA but not a related RNA.

      Weaknesses:

      - It is insinuated that RPS26 binds the RNA to enhance CGG-containing protein expression. However, RPS26 reduction was also shown previously to affect ribosome levels, and reduced ribosome levels can result in ribosomes translating very different RNA pools.

      We agree that presented data could benefit from addition of some experiments. Therefore we will address questions regarding the ribosome content, global translation rate and cell viability upon RPS26 depletion. We are also planning to apply polysome profiling to determine if RPS26-depleted ribosomes are translationally active. However, we did not state that RPS26 binds directly to RNA with expanded CGG repeats and that this interaction is crucial for translation regulation of studied RNA. We just tested such hypotheses. We will improve the text narration in revised version of the manuscript to make major conclusions clearer.

      - A significant claim is that RPS26 KD alleviates the effects of FMRpolyG expression, but those data aren't presented well.

      We thank the Reviewer 2 for this comment. We will show the data derived from a few different cell models that we already have obtained. Moreover, we will include results of experiments with luminescence readout for FMRpolyG fused with luciferase upon RPS26 KD.

      Reviewer #3 (Public Review):

      Tutak et al provide interesting data showing that RPS26 and relevant proteins such as TSR2 and RPS25 affect RAN translation from CGG repeat RNA in fragile X-associated conditions. They identified RPS26 as a potential regulator of RAN translation by RNA-tagging system and mass spectrometry-based screening for proteins binding to CGG repeat RNA and confirmed its regulatory effects on RAN translation by siRNA-based knockdown experiments in multiple cellular disease models and patient-derived fibroblasts. Quantitative mass spectrometry analysis found that the expressions of some ribosomal proteins are sensitive to RPS26 depletion while approximately 80% of proteins including FMRP were not influenced. Since the roles of ribosomal proteins in RAN translation regulation have not been fully examined, this study provides novel insights into this research field. However, some data presented in this manuscript are limited and preliminary, and their conclusions are not fully supported.

      (1) While the authors emphasized the importance of ribosomal composition for RAN translation regulation in the title and the article body, the association between RAN translation and ribosomal composition is apparently not evaluated in this work. They found that specific ribosomal proteins (RPS26 and RPS25) can have regulatory effects on RAN translation(Figures 1C, 2B, 2C, 2E, 4A, 5A, and 5B), and that the expression levels of some ribosomal proteins can be changed by RPS26 knockdown (Figure 3B, however, the change of the ribosome compositions involved in the actual translation has not been elucidated). Therefore, their conclusive statement, that is, "ribosome composition affects RAN translation" is not fully supported by the presented data and is misleading.

      We thank Reviewer 3 for critical comments and suggestions. We agree that the proposed title may be misleading and the presented data does not fully support the aforementioned statement regarding ribosomal composition affecting FMRpolyG synthesis. Hence, we will change the title together with a narrative regarding these unfortunate statements that go beyond the presented results.

      (2) The study provides insufficient data on the mechanisms of how RPS26 regulates RAN translation. Although authors speculate that RPS26 may affect initiation codon fidelity and regulate RAN translation in a CGG repeat sequence-independent manner (Page 9 and Page 11), what they really have shown is just identification of this protein by the screening for proteins binding to CGG repeat RNA (Figure 1A, 1B), and effects of this protein on CGG repeat-RAN translation. It is essential to clarify whether the regulatory effect of RPS26 on RAN translation is dependent on CGG repeat sequence or near-cognate initiation codons like ACG and GUG in the 5' upstream sequence of the repeat. It would be better to validate the effects of RPS26 on translation from control constructs, such as one composed of the 5' upstream sequence of FMR1 with no CGG repeat, and one with an ATG substitution in the 5' upstream sequence of FMR1 instead of near-cognate initiation codons.

      We will address the question regarding the influence of the content of CGG repeats and START codon selection (including different near-cognate start codons) on RPS26-sensitive translation, and present these data in revised version of the manuscript.

      (3) The regulatory effects of RPS26 and other molecules on RAN translation have all been investigated as effects on the expression levels of FMRpolyG-GFP proteins in cellular models expressing CGG repeat sequences Figures 1C, 2B, 2C, 2E, 4A, 5A, and 5B). In these cellular experiments, there are multiple confounding factors affecting the expression levels of FMRpolyG-GFP proteins other than RAN translation, including template RNA expression, template RNA distribution, and FMRpolyG-GFP protein degradation. Although authors evaluated the effect on the expression levels of template CGG repeat RNA, it would be better to confirm the effect of these regulators on RAN translation by other experiments such as in vitro translation assay that can directly evaluate RAN translation.

      We agree that there are multiple factors affecting final translation of investigated mRNA including aforementioned processes. We evaluated the level of FMR1 mRNA, which turned out not to be affected upon RPS26 depletion (Figure 2B&C), however, we will address other possibilities as well.

      (4) While the authors state that RPS26 modulated the FMRpolyG-mediated toxicity, they presented limited data on apoptotic markers, not cellular viability (Figure 1E), not fully supporting this conclusion. Since previous work showed that FMRpolyG protein reduces cellular viability (Hoem G et al., Front Genet 2019), additional evaluations for cellular viability would strengthen this conclusion.

      We thank Reviewer 3 for this suggestion. We addressed the effect of RPS26 KD on apoptotic process induced by FMRpolyG. We will perform other experiments regarding different aspects of FMRpolyG-mediated cell toxicity as well.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      This article is a direct follow-up to the paper published last year in eLife by the same group. In the previous article, the authors discovered a zinc finger protein, Kipferl, capable of guiding the HP1 protein Rhino towards certain genomic regions enriched in GRGGN motifs and packaged in heterochromatin marked by H3K9me3. Unlike other HP1 proteins, Rhino recruitment activates the transcription of heterochromatic regions, which are then converted into piRNA source loci. The molecular mechanism by which Kipferl interacts specifically with Rhino (via its chromodomain) and not with other HP1 proteins remained enigmatic. 

      In this latest article, the authors go a step further by elucidating the molecular mechanisms important for the specific interaction of Rhino and not other HP1 proteins with Kipferl. A phylogenetic study carried out between the HP1 proteins of 5 Drosophila species led them to study the importance of an AA Glycine at position 31 located in the Rhino chromodomain, an AA different from the AA (aspartic acid) found at the same position in the other HP1 proteins. The authors then demonstrate, through a series of structure predictions, biochemical, and genetic experiments, that this specific AA in the Rhino-specific chromodomain explains the difference in the chromatin binding pattern between Rhino and the other Drosophila HP1 proteins. Importantly, the G31D conversion of the Rhino protein prevents interaction between Rhino and Kipferl, phenocopying a Kipferl mutant. 

      Strengths: 

      The authors' effective use of phylogenetic analyses and protein structure predictions to identify a substitution in the chromodomain that allows Rhino's specific interaction with Kipferl is very elegant. Both genetic and biochemical approaches are applied to rigorously probe the proposed explanation. They used a point mutation in the endogenous locus that replaces the Rhino-specific residue with the aspartic acid residue present in all other HP1 family members. This novel allele largely phenocopies the defects in hatch rate, chromatin organization, and piRNA production associated with kipferl mutants, and does not support Kipferl localization to clusters. The data are of high quality, the presentation is clear and concise, and the conclusions are generally well-supported.

      Weaknesses: 

      The reviewers identified potential ways to further strengthen the manuscript.

      (1) The one significant omission is RNAseq on the rhino point mutant, which would allow direct comparison to cluster, transposon, and repeat expression in kipferl mutants. 

      In this eLife Advances submission, we aim to elucidate the molecular interaction between Rhino and the zinc finger protein Kipferl and how it evolved. Using various assays, of which piRNA sequencing is the most relevant and comprehensive, we show that the rhino[G31D] mutation phenocopies a rhino loss-of-function situation for Kipferl and a kipferl loss-of-function situation for Rhino. Further confirmation of this statement by additional RNA-seq experiments to probe the extent of selective TE de-repression would indeed be a possibility. We decided to test for TE de-repression phenotypes using sensitive RNA-FISH experiments of a handful of TEs that are deregulated in kipferl loss of function flies (Baumgartner at al. 2022). This showed that the same TEs are also deregulated in rhino[G31D] flies, further confirming the similarity of the two genotypes. We have added these data to the text and to Figure 5-figure supplement 2, which shows representative RNA FISH images.

      (2) The manuscript would benefit from adding more evolutionary comparisons. The following or similar analyses would help put the finding into a broader evolutionary perspective:

      i) Is Kipferl's surface interacting with Rhino also conserved in Kipferl orthologs? In other words, are the Rhino-interacting amino acids of Kipferl under any pressure to be conserved?

      We performed an analysis of the Kipferl interface that interacts with the Rhino chromodomain in those species where Kipferl could be unambiguously identified. This showed that the residues involved in the Rhino interaction are generally conserved. We have added this analysis to Figure 1-figure supplement 4.

      ii) The remarkable conservation of Rhino's G31 is at odds with the arms race that is proposed to be happening between the fly's piRNA pathway proteins and transposons. Does this mean that Rhino's chromodomain is "untouchable" for such positive selection? 

      We agree that the conservation of the G31 residue argues against this binding interface being under positive selection in Rhino. Without understanding the pressures acting on Rhino that underlie the previously published positive selection, we find it difficult to draw firm conclusions. Mutating G31 in fly species that lack Kipferl would be an interesting experiment.

      Recommendations for the authors:

      (1) RNAseq is important to the full characterization of the phenotype and should be included. It's now clear that the major piRNA clusters are not required for fertility, so I would also include an analysis of piRNA production and Rhino binding to regions flanking isolated insertions. 

      See our response to raised weakness #1 above. Briefly, we have now added an analysis of TE de-repression based on RNA-FISH experiments (Figure 5-figure supplement 2). Regarding the proposed analysis of piRNA production and Rhino binding to regions flanking isolated TE insertions: this is an important issue that we carefully analysed in our previous work characterising the kipferl mutant (Baumgartner et al. 2022). In the present work, we focused on generating a rhino mutant that uncouples Rhino from Kipferl.

      (2) The authors do not provide direct biochemical evidence that the chromodomain substitution blocks Rhino binding to Kipferl. However, Rhino protein is very low abundance, making analysis of the endogenous protein very difficult.

      Based on our previous work (Baumgartner et al 2022), the Rhino chromodomain interacts directly with the fourth zinc finger of Kipferl. Mutation of a single residue in the predicted interface (Rhino[G31D]) phenocopies a kipferl mutant, strongly suggesting that this mutation disrupts the Rhino-Kipferl interaction. Definitive evidence will have to await the reconstitution of this interaction using recombinant proteins. Our attempts to purify recombinant Kipferl (expressed in bacteria or in insect cells) or the protein fragments relevant to the interaction were unsuccessful so far. While we obtained soluble fractions of the first ZnF array, there was always a high level of co-purifying nucleic acids that we were not able to remove.

      (3) Even if the Rhino G31D mutant retains its ability to interact with H3K9me3 it does not localize correctly on the chromatin preventing certain regions such as locus 80F from being converted into piRNA source loci. However other regions such as satellite regions attract the Rhino mutant protein converting them into super piRNA source loci, phenocopying the effects observed in a Kipferl mutant. Why Rhino when not bound to Kipferl concentrates in satellite regions is a question that remains unanswered.

      This is a very interesting question indeed. We have not been able to elucidate the molecular basis of how Rhino is recruited to satellite repeats in Kipferl mutants. For example, we performed a proximity biotinylation experiment with GFP-Rhino in Kipferl mutant ovaries, but this experiment did not reveal any protein that would explain the observed accumulation of Rhino at the complex satellite repeats.

      (4) In the phylogenetic analysis the authors identified two residues as Rhino-specific and conserved sequence alterations, the D31G mutation and the G62 insertion. However, the authors limit their study to D31G mutation, and nothing is performed on the G32 insertion. It would have been interesting to know the impact of this insertion on Rhino's biology. 

      The role, if any, of the Rhino-specific G62 insertion and its effect on Rhino localisation or function is an interesting topic for further study. We have not investigated the G62 residue experimentally. In the current manuscript, we limited our efforts to the analysis of the G31D mutation, as the goal was to identify the mode of interaction with Kipferl, and the G62 residue is not predicted to contact Kipferl according to AlphaFold.

      (5) The authors report that the G31D mutation of Rhino phenocopies the Kipferl mutant. Rhino is wrongly localized in the nucleus, and Rhino G31D recruitment in certain Kipferl-enriched regions is affected, as at the 80F locus, which correlates with a strong drop in piRNA production from this locus. To go a step further in demonstrating that G31D phenocopies the Kipferl mutant, it would have been informative to analyse how much TE piRNAs are affected and whether TEs are deregulated.

      See our response to similar comments above. We have added RNA-FISH experiments to illustrate that the TE de-repression phenotypes are comparable between rhino[G31D] and kipferl loss of function ovaries (Figure 5-figure supplement 2). Analyses of TE-mapping piRNAs also show well correlated phenotypes (Figure 5-figure supplement 1).

      (6) Figure 3A: To homogenize with the immunostaining presented in Figure 3B, can the authors add on the bar graph depicting female fertility the results obtained with kipferl-/- and rhino-/- genotype? 

      rhino mutants are completely (100%) sterile and the fertility of kipferl mutants was previously measured to range between 15% and 40% (Baumgartner et al. 2022).

      (7) Figure 4A: It would have been interesting to show Venn diagrams showing the overlap of genomic regions enriched for Kipferl versus regions enriched for Rhi in a WT and in a Rhi G31D mutant. 

      We consider the analysis presented in Figure 4 to be more meaningful, as a Venn diagram would require binary cut-offs.

      (8) Figure 1B: In the phylogenic analysis for Rhino/HP1d two D. simulans lines are presented. Can the authors clarify this point?

      There are two Rhino paralogs in D. simulans: one paralog (NCBI: AAY34025.1) is more similar to D. melanogaster Rhino, contains one intron and is located at chromosome chr2R (assembly Apr. 2005, WUGSC mosaic 1.0/droSim1: 12256895-12258668). The second paralog (XP_002106478.1) is located on chromosome X (6734493-6735248) and does not contain an intron. We have added a clarifying statement to the corresponding figure legend.

      (9) To determine whether Rhino G31D point mutation affects the overall function of Rhino, the authors analysed Kipferl-independent piRNA source loci by looking at Responder and 1,688 family satellites. I'm not sure that these loci can be classified as Kipferl-independent piRNA source loci since a strong increase of piRNA production from these loci in Kipferl mutant is observed. In my point of view, the 42AB and 38C are real Kipferl-independent piRNA source loci as piRNA production from these loci is not affected by Kipferl KD. 

      Indeed, the Rsp and 1,688 family satellites are not completely independent of Kipferl, as their expression and Rhino occupancy differ between wild-type and kipferl loss-of-function phenotypes (including rhino[G31D]). However, we believe that this increase is due to a strong dependence on different sequestration mechanisms and is not mediated by a direct function of Kipferl at these sites. Similarly, we observe slight differences in piRNA production for the peripheral parts of cluster 42AB, as well as differences in Rhino occupancy despite an unaltered piRNA profile at cluster 38C (Baumgartner et al. 2022). Thus, different flavours of Kipferl-independence exist, with the only truly Kipferl-independent piRNA sources likely to be the piRNA clusters in the testis. A clear classification is further complicated by previously observed compensatory effects in the piRNA pathway, leading us to adopt the current definition of "requiring Kipferl for Rhino recruitment" to distinguish Kipferl-dependent from Kipferl-independent sites.

      (10) The authors report that the G31D mutation of Rhino phenocopies the Kipferl mutant. Rhino is wrongly localized in the nucleus, and Rhino G31D recruitment in certain Kipferl-enriched regions is affected, as at 80F locus, which correlates with a strong drop in piRNA production from this locus. To go a step further in demonstrating that G31D phenocopies the Kipferl mutant, it would have been interesting to look at how much TE piRNAs are affected and whether TEs (and which class of TE) are deregulated by RNAseq and/or in situ hybridization. 

      See our response to similar comments above. Our new RNA-FISH experiments and TE-mapping piRNA analysis extend the comparison of phenotypes between kipferl mutants and rhino[G31D] mutants and are consistent with our previous conclusions (Figure 5-figure supplements 1 and 2).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Jellinger et al. performed engram-specific sequencing and identified genes that were selectively regulated in positive/negative engram populations. In addition, they performed chronic activation of the negative engram population over 3 months and observed several effects on fear/anxiety behavior and cellular events such as upregulation of glial cells and decreased GABA levels.

      Strengths:

      They provide useful engram-specific GSEA data and the main concept of the study, linking negative valence/memory encoding to cellular level outcomes including upregulation of glial cells, is interesting and valuable.

      Weaknesses:

      A number of experimental shortcomings make the conclusion of the study largely unsupported. In addition, the observed differences in behavioral experiments are rather small, inconsistent, and the interpretation of the differences is not compelling.

      Major points for improvement:

      (1) Lack of essential control experiments

      With the current set of experiments, it is not certain that the DREADD system they used was potent and stable throughout the 3 months of manipulations. Basic confirmatory experiments (e.g., slice physiology at 1m vs. 3m) to show that the DREADD effects on these vHP are stable would be an essential bottom line to make these manipulation experiments convincing.

      In previous work from our lab performing long-term activation of Gq DREADD receptors in the vHPC, we quantify the presence of Gq receptor expression over 3-, 6- and 9-month timepoints and show that there is no decrease in receptor expression, as measured via fluorescence intensity (Suthard et al., 2023). In this study, we also address that even if our manipulation is only working for 1 month, rather than 3 months, we are observing the long-term effects of this shorter-term stimulation. This is still relevant, and only changes how we interpret these findings, as shorter-term stimulation or disruption of neuronal activity can still have detrimental effects on behavior.

      Furthermore, although the authors use the mCherry vector as a control, they did not have a vehicle/saline control for the hM3Dq AAV. Thus, the long-term effects such as the increase in glial cells could simply be due to the toxicity of DREADD expression, rather than an induced activity of these cells.

      For chemogenetic studies, our experimental rationale utilized a standard approach in the field, which includes one of two control options: 1) active receptor vs. control vector + ligand or 2) active receptor + ligand or saline control. We chose the first option, as this more properly controls for the potential off-target effects of the ligand itself, as shown in other previous work (Xia et al., 2017). This is particularly important for studies using CNO, as many off-target effects have been noted as a limitation (Manvich et al., 2018). We chose to use DCZ as it is closely related to CNO and newer ligands, but comes with added benefits of high specificity, low off-target effects, high potency and brain penetrance (Nagai et al., 2020), but any potential off-target effects of DCZ are yet to be completely investigated as this ligand is very new.

      Evidence of DREADD toxicity has been shown at high titer levels of AAV2/7- CamKIIα-hM4D(Gi)-mCherry in the hippocampus at 5 weeks, as the reviewer pointed out in their above comment (Goossens et al., 2021). Our viral strategy is targeted to a much smaller number of cells using AAV9-DIO-Flex-hM3Dq-mCherry at a lower titer, unlike expression within a much larger population of CaMKII+ excitatory neurons in this study. Additionally, visual comparison of their viral load and expression with ours shows much more intense expression that spans a larger area of the hippocampus (Goossens et al, 2021; Figure 1D), whereas ours is isolated to a smaller region of vHPC (see Figure 1B).

      Further, we attempted to quantify a decrease in neuronal health (Yousef et al., 2017) resulting from DREADD expression via NeuN counts within multiple hippocampal subregions for the 6- and 14-month groups across active Gq receptor and mCherry conditions and did not observe significant decreases in NeuN as a result (Supplemental Figure 1). However, immunohistochemistry of an individual marker may not be sufficient to capture the entire health profile of an individual neuron and future work should consider other markers of cell death or inflammation, which we have added to the Limitations & Future Work section of our Discussion.

      (2) Figure 1 and the rest of the study are disconnected

      The authors used the cFos-tTA system to label positive/negative engram populations, while the TRAP2 system was used for the chronic activation experiments. Although both genetic tools are based on the same IEG Fos, the sensitivity of the tools needs to be validated. In particular, the sensitivity of the TRAP2 system can be arbitrarily altered by the amount of tamoxifen (or 4OHT) and the administration protocols. The authors should at least compare and show the percentage of labeled cells in both methods and discuss that the two experiments target (at least slightly) different populations. In addition, the use of TRAP2 for vHP is relatively new; the authors should confirm that this method actually captures negative engram populations by checking for reactivation of these cells during recall by overlap analysis of Fos staining or by artificial activation.

      We thank the reviewer for their comments and opportunity to discuss the marked differences between TRAP2 and DOX systems. In particular, we agree that while both systems rely on the the Fos promoter to drive an effector of interest, their efficacy and temporal resolution vary substantially depending on genetic cell-type, brain region, temporal parameters of Dox or 4-OHT delivery, subject-by-subject metabolic variability, and threshold to Fos induction given the promoter sequences inherent to each system. For example, recent studies have reported the following:

      - The TRAP2 line labels a subset of endogenously activeCA1 pyramidal cells (e.g. 5-18%) while the DOX system labels 20-40% of CA1 pyramidal cells (DeNardo et al, 2019; Monasterio et al, BioRxiv 2024 ).

      - The temporal windows for each range from hours in TRAP2 to 24-48 hours for DOX (DeNardo et al, 2019; Denny et al, 2014; Liu & Ramirez et al, 2012).

      - The efficacy of “tagging” a population of cells with TRAP2 vs with DOX will constrain the number of possible cells that may overlap with cFos upon re-exposure to a given experience (e.g. see the observed overlaps in vCA1 - BLA circuits (Kim & Cho, 2020), compared to vCA1 in general (Ortega-de San Luis et al, 2023) and valence-specific vCA1 populations (Shpokayte et al, 2022).

      - Tagging vCA1 cells with both the TRAP2 and DOX systems are nonetheless sufficient to drive corresponding behaviors (e.g. vCA1 terminal stimulation drives behavioral changes with the DOX and TRAP2 system (Shpokayte et al, 2022) and vCA1 stimulation of an updated fear-linked ensemble drives light-induced freezing in a neutral context utilizing the TRAP2 and DOX systems (Ortega-de San Luis et al, 2023)).

      Finally, and promisingly, as more studies continue to link the in vivo physiological dynamics of these cell populations tagged using each system (e.g. compare Pettit et al, 2022 with Tanaka et al, 2018) and correlating their activity to behavioral phenotypes, our field is in the prime position to uncover deeper principles governing hippocampus-mediated engrams in the brain. Together, we believe a more comprehensive understanding of these systems is fully warranted, especially in the service of further cataloging cellular similarities and differences within such tagged populations.

      (3)  Interpretation of the behavior data

      In Figures 3a and b, the authors show that the experimental group showed higher anxiety based on time spent in the center/open area. However, there were no differences in distance traveled and center entries, which are often reduced in highly anxious mice. Thus, it is not clear what the exact effect of the manipulation is. The authors may want to visualize the trajectories of the mice's locomotion instead of just showing bar graphs.

      Our findings show that our experimental group displays higher levels of anxiety-like behaviors as measured via time spent in center/open area, while there are no differences in distance traveled or center entries. For distance traveled, our interpretation is in line with complementary research (Jimenez et al, 2018; Kheirbek et al, 2013) that shows no changes in distance traveled/distance traveled in the center coupled with changes in anxiety levels as a result of manipulation within anxiety-related circuits. More broadly, any locomotion-related deficit could cause a change in distance traveled that is unrelated to anxiety-like behaviors alone. For example, a reduction in distance traveled could be coupled with a decrease in time spent in the center, but could also result only from motor or exploratory deficits. We hope that this explanation clarifies our interpretation of the open field and elevated plus maze findings in light of other literature.

      In addition, the data shown in Figure 4b is somewhat surprising - the 14MO control showed more freezing than the 6MO control, which can be interpreted as "better memory in old". As this is highly counterintuitive, the authors may want to discuss this point. The authors stated that "Mice typically display increased freezing behavior as they age, so these effects during remote recall are expected" without any reference. This is nonsense, as just above in Figure 4a, older mice actually show less freezing than young mice. Overall, the behavioral effects are rather small and random. I would suggest that these data be interpreted more carefully.

      In Figure 4B, we present our findings from remote recall and observe increased freezing levels in control mice with age, as mentioned by the reviewer, indicating increased memory. This is in line with previous work from Shoji & Miyakawa, 2019 which has been added as a reference for the quotation described above; we thank the reviewer for pointing this error out. As the reviewer has pointed out, above in Figure 4A, we measured freezing levels across all groups during contextual fear conditioning before the start of chronic stimulation, as this was the session we ‘tagged’ a negative memory in. Although it appears that there may be slightly lower levels of freezing in older (14-month old) mice, our findings do not determine statistical significance for difference between age group, only effects of time and subject which are expected as freezing increases within the session and animals display high levels of variability in freezing levels across many experiments (Figure 4A i-iii). We also find in previous work that control mice receiving 3-, 6- and 9-months of chronic DCZ stimulation in the vHPC with empty vector (mCherry) receptor show an increase in freezing with age (Suthard et al, 2023; Figure 2A ii).

      (4) Lack of citation and discussion of relevant study

      Khalaf et al. 2018 from Gräff lab showed that experimental activation of recall-induced populations leads to fear attenuation. Despite the differences in experimental details, the conceptual discrepancy should be discussed.

      As mentioned by the reviewer, Khalaf et al. 2018 showed that experimental activation of recall-induced populations in the dentate gyrus leads to fear attenuation. Specifically, they pose that this fear attenuation occurs in these ensembles through updating or unlearning of the original memory trace via the engagement, rather than suppression, of an original traumatic experience. Despite the differences in experimental details with our current study and this work, we agree that the conceptual discrepancy should be discussed. First, one major difference is that we are reactivating an ensemble that was tagged during fear memory encoding, while Khalaf et al. are activating a remote recall-induced ensemble that was tagged one month after encoding. Although there is high overlap between the encoding and recall ensembles when mice are exposed to the conditioning context, these ensembles are not identical and may result in different behavioral phenotypes when chronically reactivated. Further, Khalaf et al rely on reactivation of the recall-induced ensemble during extinction to facilitate rapid fear attenuation. This differs from our current work, as their reactivation is occurring during the extinction process in the previously conditioned context, while we are reactivating chronically in the animal’s home cage over the course of a longer time period. It may be necessary that the memory is first reactivated, and thus, more liable to re-contextualization, in the original context compared to an unrelated homecage environment where there are presumably no related cues present. Importantly, this previous work tests the attenuation of fear shortly after an extinction process, while we are not traditionally extinguishing the context with aid of the memory reactivation. Finally, we are testing remote recall (3 months post-conditioning), while they are testing at a shorter time interval (28 days). In line with these ideas, future work may seek to tease out the mechanistic differences between recent and remote memory extinction both in terms of natural memory recall and chronically manipulated memory-bearing cells.

      Reviewer #2 (Public Review):

      Summary:

      Jellinger, Suthard, et al. investigated the transcriptome of positive and negative valence engram cells in the ventral hippocampus, revealing anti- and pro-inflammatory signatures of these respective valences. The authors further reactivated the negative valence engram ensembles to assay the effects of chronic negative memory reactivation in young and old mice. This chronic re-activation resulted in differences in aspects of working memory, and fear memory, and caused morphological changes in glia. Such reactivation-associated changes are putatively linked to GABA changes and behavioral rumination.

      Strengths:

      Much of the content of this manuscript is of benefit to the community, such as the discovery of differential engram transcriptomes dependent on memory valence. The chronic activation of neurons, and the resultant effects on glial cells and behavior, also provide the community with important data. Laudable points of this manuscript include the comprehensiveness of behavioral experiments, as well as the cross-disciplinary approach.

      Weaknesses:

      There are several key claims made that are unsubstantiated by the data, particularly regarding the anthropomorphic framing of "rumination" on a mouse model and the role of GABA. The conclusions and inferences in these areas need to be carefully considered.

      (1) There are many issues regarding the arguments for the behavioural data's human translation as "rumination." There is no definition of rumination provided in the manuscript, nor how rumination is similar/different to intrusive thoughts (which are psychologically distinct but used relatively interchangeably in the manuscript), nor how rumination could be modelled in the rodent. The authors mention that they are attempting to model rumination behaviours by chronically reactivating the negative engram ("To understand if our experimental model of negative rumination..."), but this occurs almost at the very end of the results section, and no concrete evidence from the literature is provided to attempt to link the behavioural results (decreased working memory, increased fear extinction times) to rumination-like behaviours. The arguments in the final paragraph of the Discussion section about human rumination appear to be unrelated to the data presented in the manuscript and contain some uncited statements. Finally, the rumination claims seem to be based largely upon a single data figure that needs to be further developed (Figure 6, see also point 2 below).

      (2) The staining and analysis in Figure 6 are challenging to interpret, and require more evidence to substantiate the conclusions of these results. The histological images are zoomed out, and at this resolution, it appears that only the pyramidal cell layer is being stained. A GABA stain should also label the many sparsely spaced inhibitory interneurons existing across all hippocampal layers, yet this is not apparent here. Moreover, both example images in the treatment group appear to have lower overall fluorescence intensity in both DAPI and GABA. The analysis is also unclear: the authors mention "ROIs" used to measure normalized fluorescence intensity but do not specify what the ROI encapsulates. Presumably, the authors have segmented each DAPI-positive cell body and assessed fluorescence however, this is not explicated nor demonstrated, making the results difficult to interpret.

      Based on the collective discussion from all reviewers on the completeness of our GABA quantification and its implications, we have decided to remove this figure and perform more substantive analysis of this E/I imbalance in future work.

      (3) A smaller point, but more specific detail is needed for how genes were selected for GSEA analysis. As GSEA relies on genes to be specified a priori, to avoid a circular analysis, these genes need to be selected in a blind/unbiased manner to avoid biasing downstream results and conclusions. It's likely the authors have done this, but explicitly noting how genes were selected is an important context for this analysis.

      As mentioned in our Methods section, gene sets were selected based on pre-existing biology and understanding of genes canonically involved in “neurodegeneration” such as those related to apoptotic pathways and neuroinflammation or “neuroprotection” such as brain-derived neurotrophic factor, to name a few. A limitation of this method is that we must avoid making strong claims about the actual function of these up- or down-regulated genes without performing proper knock-in or knock-out studies, but we hope that this provides an unbiased inventory for future experiments to perform causal manipulations.

      Reviewer #3 (Public Review):

      Summary:

      The authors note that negative ruminations can lead to pathological brain states and mood/anxiety dysregulation. They test this idea by using mouse engram-tagging technology to label dentate gyrus ensembles activated during a negative experience (fear conditioning). They show that chronic chemogenetic activation of these ensembles leads to behavioral (increased anxiety, increased fear generalization, reduced fear extinction) and neural (increases in neuroinflammation, microglia, and astrocytes).

      Strengths:

      The question the authors ask here is an intriguing one, and the engram activation approach is a powerful way to address the question. Examination of a wide range of neural and behavioral dependent measures is also a strength.

      Weaknesses:

      The major weakness is that the authors have found a range of changes that are correlates of chronic negative engram reactivation. However, they do not manipulate these outcomes to test whether microglia, astrocytes, or neuroinflammation are causally linked to the dysregulated behaviors.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      - Figure 2c should include Month0, the BW before the start of the manipulation.

      Regrettably, we do not have access to the Month 0 body weights at this time as this project changed hands over the course of the past year or so. This is an inherent limitation that we missed during analysis and we pose this as a limitation in the Results section after describing this finding. Therefore, it is possible that over the first month of stimulation (Month 0-1), there may have been a drop in body weight that rebounded by the first measurement at Month 1 that continued to increase normally through Months 2-3, as shown in our Figure 1. Thank you for this note.

      - Figure 6a looks confusing - the background signal in the green channel is very different between control and experimental groups. Were representative images taken with different microscope settings?

      The representative images were taken with the same microscope power settings, but were adjusted in brightness/contrast within FIJI for clarity in the Figure – we apologize that this was misleading in any way and thank the reviewer for their feedback. Further, based on the collective discussion from all reviewers on the completeness of our GABA quantification and its implications, we have decided to remove this figure and perform more substantive analysis of this E/I imbalance in future work.

      - Typo mChe;try

      This typo was fixed

      - "During this contextual... mice in the 6- and 14- month groups..." Isn't it 3- and 11- month respectively at the time of fear conditioning? Throughout the manuscript, this point was written very confusingly.

      Yes, we thank the reviewer for pointing this out. It has been corrected to 3- and 11-month old mice at the timing of fear conditioning and clarified throughout the manuscript where applicable.

      - "GABAergic eYFP fluorescence" Where does the eYFP come from? The methods state that GABA quantification is based on IHC staining.

      Based on the collective discussion from all reviewers on the completeness of our GABA quantification and its implications, we have decided to remove this figure and perform more substantive analysis of this

      E/I imbalance in future work. We discuss this E/I balance not being directly assessed in the Limitations & Future Directions section of our Discussion, noting the importance of detailed quantification of both excitatory and inhibitory markers within the hippocampus.

      Reviewer #2 (Recommendations For The Authors):

      (1) There is a full methods section ("Analysis of RNA-seq data") that mostly describes RNA-seq analysis that seemingly does not appear in the paper. This section should be reviewed.

      We have included this portion of the methods that explain the previous workflow from Shpokayte et al., 2022 where this dataset was generated and this has been noted in the “Analysis of RNA-seq data” section of the methods.

      (2) Figure 6: GABA staining should be more critically analyzed, as discussed above, and validated with another GABA antibody for rigor. From the representative images provided in Figure 6, it looks possibly as though the hM3Dq images were simply not fully in the focal plane when being imaged or were over-washed, as DAPI staining also appears to be lower in these images.

      Based on the collective discussion from all reviewers on the completeness of our GABA quantification and its implications, we have decided to remove this figure and perform more substantive analysis of this E/I imbalance in future work. Specifically, it will be necessary to rigorously investigate both excitatory and inhibitory markers within this region to ensure these claims are substantiated. Thank you for this suggestion.

      (3) The first claim that human GABAergic interneurons cause rumination is uncited. (Page 19, first sentence beginning with: "Evidence from human studies suggests...").

      Based on the collective discussion from all reviewers on the completeness of our GABA quantification and its implications, we have decided to remove this figure and perform more substantive analysis of this E/I imbalance in future work. Apologies for the lack of citation in-text, the proper citation for this finding is Schmitz et al, 2017.

      (4) Gene names throughout the manuscript and figure are written in the wrong format for mice (eg: Page 13, second line: SPP1, TTR, and C1QB1 instead of Spp1, Ttr, C1qb1).

      This was corrected throughout the manuscript.

      (5) Tense on Page 15 third sentence of the second paragraph: "...spatial working memory was assessed...".

      This was corrected throughout the manuscript.

      (6) Supplemental Figure 1 would benefit from normalization of the NeuN+ cell counts. The inclusion of an excitatory and inhibitory neuron marker in this figure might benefit the argument that there is a change in the excitation/inhibition of the hippocampus - as the numbers of excitatory neurons outweigh the numbers of inhibitory neurons that would be assayed here.

      In an effort to normalize the NeuN+ cell counts, for each of our ROIs (6-8 single tiles for each brain region (DG, vCA1, vSub) x 3-5 coronal slices = ~18 single tiles per mouse x 3-4 mice) we captured a 300 x 300 micrometer, single-tile z-stack at 20x magnification. These ROIs were matched for dimensions and brain regions across all groups for each hippocampal subregion quantified. We initially proposed to normalize these NeuN counts over DAPI, but because DAPI includes all nuclei (microglia, oligodendrocytes, astrocytes and neurons), we weren’t sure this was the most optimal tool. We do agree that further quantification of excitatory and inhibitory cell markers would be vital to more concrete interpretation of our findings and we have added this to our Limitations & Future Work section of the Discussion.

      Reviewer #3 (Recommendations For The Authors):

      (1) The DOX tagging window lacks temporal precision. I suggest the authors note this as a limitation.

      We thank the reviewer for noting this, and we have added this limitation to the Methods section with the context of the 24-48 hour DOX window being longer than other methods like TRAP.

      (2) Is there a homeostatic response to chronic engram stimulation? That is, is DCZ as effective in increasing neuronal excitability on day 90 as it is on day 1. This could be addressed with electrophysiology, or with IEG induction. Alternatively, the authors could refer to previous literature-- for example, Xia et al (2017) eLife-- that examined whether there was any blunting of the effects of DREADD ligands after sustained delivery via drinking water. There, of course, may be other papers as well.

      As noted by the reviewer, it is important to determine if DCZ maintains its effects on neuronal excitability throughout the 3 month administration period. To address this, previous work has shown that CNO administration in drinking water over one month consistently inhibited hM4Di+ neurons without altering baseline neuronal excitability as measured by firing rate and potassium currents (Xia et al, 2017). Although this is only for one month, it is administered via the same oral route as our DCZ protocol and suggests that at least for that amount of time we are likely producing consistent effects. In our reply above to Reviewer #1’s comment, we also note that even if DCZ is only having an effect for one month, rather than 3 months, we are still observing enduring changes that resulted from this short-term disturbance.

      (3) Please double check there is no group effect on weight in 6-month-old mice in Figure 2C.

      Two-way RM ANOVA showed no main effect of Group within the 6-month-old control and hM3Dq groups.

      Group: F(1,17) = 1.361, p=0.2594.

      (4) The shock intensity is much higher than is typical for fear conditioning studies in mice. Why was this the case?

      Yes, we do agree that this shock intensity is on the higher side of typical paradigms in mice, however, our lab has utilized 0.75mA to 1.5mA intensity foot shocks for contextual fear conditioning in the past (Suthard & Senne et al, 2023; 2024; Dorst & Senne et al, 2023; Grella et al., 2022; Finkelstein et al., 2022) and we maintained this protocol for internal consistency. However, it would be interesting to systematically investigate how differing intensities of foot shock, subsequent tagging of this ensemble and reactivation would uniquely impact behavioral state acutely and chronically in mice.

      (5) Remote freezing is very low. The authors should comment on this-- perhaps repeated testing has led to some extinction?

      A reviewer above suggested a similar phenomenon may be occuring, specifically fear attenuation as a result of chronic stimulation. They referenced previous work from Khalaf et al. 2018, where they reactivated a recall-induced ensemble, while we reactivated an ensemble tagged during encoding. We expand upon this work in light of our findings within the Limitations & Future Work section of our Discussion. However, we do appreciate the lower levels of freezing observed in remote recall and sought out other literature to understand the typical range of remote freezing levels. One thing that we note is that our remote recall is occurring 3 months after conditioning, which is much longer than typical 14-28 day protocols. However, we find that freezing levels at remote timepoints from 21-45 days results in contextual freezing levels of between 20-50% approximately (Kol et al., 2020), as well as 40-75% approximately in a variety of 28 day remote recall experiments (Lee et al., 2023). This information, together with our current experimental protocol demonstrates a wide range of remote freezing levels that may depend heavily on the foot shock intensity, duration of days after conditioning, and animal variability.

      (6) "mice display increased freezing with age": please add a reference.

      Apologies, we missed the citation for that claim and it has been added in-text and in the references list (Shoji & Miyakawa, 2019).

      (7) Related to the low freezing levels for remote memory, why is generalization minimal? Many studies have shown that there is a time-dependent emergence of generalized fear, yet here this is not seen. Is it linked to extinction (as above)? Or genetic background?

      Previous work has shown that rats receiving multiple foot shocks during conditioning displayed a time-dependent generalization of context memory, while those receiving less shocks did not (Poulos et al., 2016), as the reviewer noted in their comment. In our current study, we observe low levels of generalization in all of our groups compared to freezing levels displayed in the conditioned context at the remote timepoint, in opposition to this time-dependent enhancement of generalization. It is possible that the genetic background of our C57BL/6J mice compared to the Long-Evans rat strain in this previous work accounts for some of this difference. In addition, it is possible that the longer duration of time (3 months) compared to their remote timepoint (28 days) resulted in time-dependent decrease in generalization that decreases with greater durations of time from original conditioning. As noted above, it is indeed plausible that the reactivation of a contextual fear ensemble over time is attenuating freezing levels for both the original and similar contexts (Khalaf et al, 2018). We discuss the differences in our study and this 2018 work more comprehensively above.

      (8) Morphological phenotypes of astrocytes/microglia. Would be great to do some transcriptomic profiling of microglia/astrocytes to couple with the morphological characterization (but appreciate this is beyond the scope of current work).

      We thank the reviewer this suggestion, we agree that would be an incredibly informative future experiment and have added this to our Limitations & Future Experiments section of the Discussion.

      (9) The authors could consider including a limitations section in their discussion which discusses potential future directions for this work:

      - causal experiments.

      - E/I balance is not assessed directly (interestingly, in this regard, expanded engrams are linked to increased generalization [e.g., Ramsaran et al 2023]).

      Thank you for this suggestion, we have added a Limitations & Future Directions section to our Discussion and have expanded upon these suggested points.

      (10) For Figure 10, consider adding an experimental design/timeline.

      We are making the assumption that the reviewer meant Figure 1 instead of Figure 10 here, but note that there is a description of the viral expression duration (D0-D10), followed by an off Dox period of 48 hours (D10-D12), with subsequent engram tagging of a negative (foot shock) or positive (male-to-female exposure) on D12. In our experiments (Shpokayte et al., 2022), Dox was administered for 24 hours (D12-D13), which was followed by sacrificing the animal for cell suspension and sequencing of the positive and negative engram populations. This figure also shows the viral strategy for the Tet-tag system (Figure 1A), as well as representative viral expression in vHPC (Figure 1B). We are happy to add additional experimental design/timeline information to this figure that would be helpful to the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In light of some reviewer comments requesting more clarity on the relationship between our model and prior theoretical studies of systems consolidation, we propose a modification to the title of our manuscript: “Selective consolidation of learning and memory via recall-gated plasticity.” We believe this title better reflects the key distinguishing feature of our model, that it selectively consolidates only a subset of memories, and also highlights the model’s applicability to task learning as well as memory storage.

      Major comments:

      Reviewer #3’s primary concern with the paper is the following: “The main weakness of the paper is the equation of recall strength with the synaptic changes brought about by the presentation of a stimulus. In most models of learning, synaptic changes are driven by an error signal and hence cease once the task has been learned. The suggested consolidation mechanism would stop at that point, although recall is still fine. The authors should discuss other notions of recall strength that would allow memory consolidation to continue after the initial learning phase.”

      We thank the reviewer for drawing attention to this issue, which primarily results from a poor that memories should be interpreted as actual synaptic weight updates,∆𝑤and thus in the context choice of notation on our part. Our decision to denote memories as gives the impression of supervised learning would go to zero when the task is learned. However, in the formalism of our model, memories are in fact better interpreted as target values of synaptic weights, and the synaptic model/plasticity rule is responsible for converting these target values into synaptic weight updates. We were unclear on this point in our initial submission, because our paper primarily considers binary synaptic weights, where target synaptic weights have a one-to-one correspondence with candidate synaptic weight updates. We have updated the paper to use w* to refer to memories, which we hope resolves this confusion, and have updated our introduction to the term “memory” to reflect their interpretation as target synaptic weight values. We have also updated the paper’s language to more clearly disambiguate between the “learning rule,” which determines how the memory vector (target synaptic weight vectors) are derived from task variables, and the “plasticity rule,” which governs how these are translated into actual synaptic weight updates. We acknowledge that our manuscript still does not explicitly consider a plasticity rule that is sensitive to continuous error error signals, as our analysis is restricted to binary weights. However, we believe that the updated notation and exposition makes it more clear that our model could be applied in such a case.

      Reviewer #1 brought up that our framework cannot capture “single-shot learning, for example, under fear conditioning or if a presented stimulus is astonishing.” Reviewer #2 raised a related question of how our model “relates to the opposite more intuitive idea, that novel surprising experiences should be stored in memory, as the familiar ones are presumably already stored.”

      We agree that the built-in inability to consolidate memories after a single experience is a limitation of our model, and that extreme novelty is one factor (among others, such as salience or reward) that might incentivize one-shot consolidation. We have added a comment to the discussion to acknowledge these points (added text in bold): “ Moreover, in real neural circuits, additional factors besides recall, such as reward or salience, are likely to influence consolidation as well. For instance, a sufficiently salient event should be stored in long-term memory even if encountered only once. Furthermore, while in our model familiarity drives consolidation, certain forms of novelty may also incentivize consolidation, raising the prospect of a non-monotonic relationship between consolidation probability and familiarity.” We agree that future work should address the combined influence of recall (as in our model) and other factors on the propensity to consolidate a memory.

      Reviewer #1 requested, “a comparison/discussion of the wide range of models on synaptic tagging for consolidation by various types of signals. Notably, studies from Wulfram Gerstner's group (e.g., Brea, J., Clayton, N. S., & Gerstner, W. (2023). Computational models of episodic-like memory in food-caching birds. Nature Communications, 14(1); and studies on surprise).”

      We thank the reviewer for the reference, which we have added to the manuscript. The model of Brea et al.(2023) is similar to that of Roxin & Fusi (2013), in that consolidation consists of “copying” synaptic weights from one population to another. As a result, just like the model of Roxin & Fusi (2013), this model does not provide the benefit that our model offers in the context of consolidating repeatedly recurring memories. However, the model of Brea et al. does have other interesting properties – for instance, it affords the ability to decode the age of a memory, which our model does not. We have added a comment on this point in the subsection of the Discussion tilted “Other models of systems consolidation.”

      Reviewer #2 noted, “While the article extensively discusses the strengths and advantages of the recall-gated consolidation model, it provides a limited discussion of potential limitations or shortcomings of the model, such as the missing feature of generalization, which is part of previous consolidation models. The model is not compared to other consolidation models in terms of performance and how much it increases the signal-to-noise ratio.”

      We agree that our work does not consider the notion of generalization and associated changes to representational geometry that accompany consolidation, which is the focus of many other studies on consolidation. We have further highlighted this limitation in the discussion. Regarding the comparison to other models, this is a tricky point as the desiderata we emphasize in this study (the ability to recall memories that are intermittently reinforced) is not the focus of other studies. Indeed, our focus is primarily on the ability of systems consolidation to be selective in which memories are consolidated, which is somewhat orthogonal to the focus of many other theoretical studies of consolidation. We have updated some wording in the introduction to emphasize this focus.

      Additional comments made by reviewer #1

      Reviewer #1 pointed out issues in the clarity of Fig. 2A. We have added substantial clarifying text to the figure caption.

      Reviewer #1 pointed out lack of clarity in our introduction to the terms “reliability” and “reinforcement.” We have now made it more clear what we mean by these terms the first time they are used.

      We have updated our definition of “recall” to use the term “recall factor,” which is how we refer to it subsequently in the paper.

      We have made explicit in the main text our simplifying assumption that memories are mean-centered.

      We have made consistent our use of “forgetting curve” and “memory trace”.

      Additional comments made by reviewer #2

      We have added a comment in the discussion acknowledging alternative interpretations of the result of Terada et al. (2021)

      We have significantly expanded the discussion of findings about the mushroom body to make it accessible to readers who do not specialize in this area. We hope this clarifies the nature of the experimental finding, which uncovered a circuit that performs a strikingly clean implementation of our model.

      The reviewer expresses concern that the songbird study (Tachibana et al., 2022) does not provide direct evidence for consolidation being gated by familiarity of patterns of activity. Indeed, the experimental finding is one-step removed from the direct predictions of our model. That said, the finding – that the rate of consolidation increases with performance – is highly nontrivial, and is predicted by our model when applied to reinforcement learning tasks. We have added a comment to the discussion acknowledging that this experimental support for our model is behavioral and not mechanistic.

      We do not regard it as completely trivial that the parallel LTM model performs roughly the same as the STM model, since a slower learning rate can achieve a higher SNR (as in Fig. 2C). Nevertheless we have added wording to the main text around Fig. 4B to note that the result is not too surprising.

      We have added a sentence that clarifies the goal / question of our paper earlier on in the introduction.

      We have updated Figure 3 by labeling the key components of the schematics and adding more detail to the legend, as suggested by the reviewer. We also reordered the figure panels as suggested.

      Additional comments made by reviewer #3:

      We have clarified in the main text that Fig. 2C and all results from Fig. 4 onward are derived from an ideal observer model (which we also more clearly define).

      We have now emphasized in the main text that the derivations of the recall factors for specific learning rules are derived in the Supplementary Information.

      We have highlighted more clearly in the main text that the recall factors associated with specific learning rules may correspond to other notions that do not intuitively correspond to “recall,” and have added a pointer to Fig. 3A where these interpretations are spelled out.

      We have added references corresponding to the types of learning rules we consider.

      The cutoffs / piecewise-looking behavior of plots in Fig. 4 are primarily the result of finite N, which limits the maximum SNR of the system, rather than coarse sampling of parameter values.

      Thank you for pointing out the error in the legend in Fig. 5D (also affected Supp Fig. S7/S8), which is now fixed.

      The reference to the nonexistence panel Fig. 5G has been removed.

      As the reviewer points out, the use of a binary action output in our reinforcement learning task renders it quite similar to the supervised learning task, making the example less compelling. In the revised manuscript we have updated the RL simulation to use three actions. Note also that in our original submission the network outputs represented action probabilities directly (which is straightforward to do for binary actions, but not for more than two available actions). In order to parameterize a policy when more than two actions are available, we sample actions using a softmax policy, as is more standard in the field and as the reviewer suggested. The associated recall factor is still a product of reward and a “confidence factor,” and the confidence factor is still the value of the network output in the unit corresponding to the chosen action, but in the updated implementation this factor is equal to , similar (though with a sign difference) to the reviewer’s suggestion. We believe these updates make our RL implementation and simulation more compelling, as it allows them to be applied to tasks with arbitrary numbers of actions.

      Additional minor comments

      The reviewers made a number of other specific line-by-line wording suggestions, typo corrections,

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The mechanisms of how axonal projections find their correct target requires the interplay of signalling pathways, and cell adhesion that act over short and long distances. The current study aims to use the small ventral lateral clock neurons (s-LNvs) of the Drosophila clock circuit as a model to study axon projections. These neurons are born during embryonic stages and are part of the core of the clock circuit in the larval brain. Moreover, these neurons are maintained through metamorphosis and become part of the adult clock circuit. The authors use the axon length by means of anti-Pdf antibody or Pdf>GFP as a read-out for the axonal length. Using ablation of the MB- the overall target region of the s-LNvs, the authors find defects in the projections. Next, by using Dscam mutants or knock-down they observe defects in the projections. Manipulations by the DNs - another group of clock neurons- can induce defects in the s-LNvs axonal form, suggesting an active role of these neurons in the morphology of the s-LNvs.

      Strengths:

      The use of Drosophila genetics and a specific neural type allows targeted manipulations with high precision.

      Proposing a new model for a small group of neurons for axonal projections allows us to explore the mechanism with high precision.

      Weaknesses:

      It is unclear how far the proposed model can be seen as developmental.

      The study of changes in fully differentiated and functioning neurons may affect the interpretation of the findings.

      We appreciate the reviewer's feedback on the strengths and weaknesses of our study.

      We acknowledge the strengths of our research, particularly the precision afforded by using Drosophila genetics and a specific neural type for targeted manipulations, as well as the proposal of a new model for studying axonal projections in a small group of neurons.

      We understand the concerns about the developmental aspects of our proposed model and the use of Pdf-GAL4 >GFP as a read-out for the axonal length (revised manuscript Figure 1--figure supplement 1). However, even with the use of Clk856-GAL4 that began to be expressed at the embryonic stage (revised manuscript Figure 3--figure supplement 1) to suppress Dscam expression, the initial segment of the dorsal projection of s-LNvs (the vertical part) remained unaffected. Instead, the projection distance is severely shortened towards the midline, and this defect persists until the adult stage. It is for this reason that we delineate the dorsal projections of s-LNvs into two distinct phases: the vertical and horizontal parts, rather than a mere expansion in correspondence with the development of the larval brain.

      Thank you for your valuable feedback, and we have incorporated these considerations into our revised manuscript to enhance the clarity and depth of our research.

      Reviewer #2 (Public Review):

      Summary:

      The paper from Li et al shows a mechanism by which axons can change direction during development. They use the sLNv neurons as a model. They find that the appearance of a new group of neurons (DNs) during post-embryonic proliferation secretes netrins and repels horizontally towards the midline, the axonal tip of the LNvs.

      Strengths:

      The experiments are well done and the results are conclusive.

      Weaknesses:

      The novelty of the study is overstated, and the background is understated. Both things need to be revised.

      We appreciate your acknowledgment that the experiments were well-executed and the results conclusive. This validation reinforces the robustness of our findings.

      We take note of your feedback regarding the novelty of the study being overstated and the background being understated. While axonal projections navigate without distinct landmarks, like the midline or the layers, columns, and segments, they pose more challenges and uncertainties. As highlighted, our key contribution lies in elucidating how axonal projections without clear landmarks are guided, with our research demonstrating how a newly formed cluster of cells at a specific time and location provides the necessary guidance cues for axons.

      We value your insights, and we have carefully addressed these points in our manuscript revision to improve the overall quality and presentation of our research.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      The overall idea of using the s-LNvs as a model is indeed intriguing. There are genetic tools available to tackle these cells with great precision.

      However, based on the stage at which these cells are investigated raises some issues, that I feel are critical to be addressed.

      These neurons develop their axonal projections during embryogenesis and are fully functioning when the larvae hatch, thus to investigate axonal pathfinding one would have to address embryonic development.

      The larval brain indeed continues to grow during larval life, however extensive work from the Hartenstein lab, Truman lab, and others have shown that the secondary (larval born) neurons do not yet wire into the brain, but stall their axonal projections.

      It is thus quite unclear, what the authors are actually studying.

      One interpretation could be that the authors observe changes in axon length due to morphological changes in the brain. Indeed, the fact that the MB expands the anatomy of the surrounding neuropil changes too.

      Moreover, it is unclear when exactly the Pdf-Gal4 (and other drivers) are active, thus how far (embryonic) development of s-LNvs is affected, or if it's all happening in the differentiated, functioning neuron. (Gal4 temporal delay and dynamics during embryonic development may further complicate the issue). As far as I am aware the MB drivers might already be active during embryonic stages.

      Since the raised issue is quite fundamental, I am not sure what might be the best and most productive fashion to address this.

      Eg. either to completely re-focus the topic on "neural morphology maintenance" or to study the actual development of these cells.

      We thank the reviewer for the detailed and insightful feedback on our study. We have tested whether Pdf-Gal4 could effectively label s-LNv, and tracked the s-LNv projection in the early stage after larvae hatching. We did not observe the PDF antibody staining signal and the GFP signal driven by Pdf-GAL4 when the larvae were newly hatched. At 2-4 hours ALH, PDF signals were primarily concentrated at the end of axons, while GFP signals were mainly concentrated at the cell body. Helfrich-Förster initially detected immunoreactivity for PDF in the brains approximately 4-5 hours ALH. The GFP signal expressed by Pdf-GAL4 driver does have signal delay. However, at 8 hours ALH, the GFP signal strongly co-localized with the PDF signal within the axons (see revised manuscript lines 98-101) (Figure 1—figure supplement 1).

      Based on previous research findings and our staining of Clk856-GAL4 >GFP, it is indeed confirmed that the dorsal projection of s-LNvs in Drosophila is formed during the embryonic stage (Figure 3—figure supplement 1). The s-LNvs in first-instar larval Drosophila are capable of detecting signal output and may play a role in regulating certain behaviors. Our selection of tools for characterizing the projection pattern of s-LNv was not optimal, leading us to overlook the crucial detail that the projection had already formed during its embryonic stage.

      However, even when employing Clk856-GAL4 to suppress Dscam expression from the embryonic stage, the initial segment of the dorsal projection of s-LNvs (the vertical part) remains unaffected. Instead, the projection distance is severely shortened towards the midline, and this defect persists until the adult stage. It is for this reason that we delineate the dorsal projections of s-LNvs into two distinct phases: the vertical and horizontal parts, rather than a mere expansion in correspondence with the development of the larval brain.

      From the results searched in the Virtual Fly Brain (VFB) database (https://www.virtualflybrain.org/), it is clear that the neurons that form synaptic connections with s-LNvs at the adult stage are essentially completely different from the neurons that are associated with them at the L1 larval stage. Thus, most neurons that form synapses with s-LNvs in the early larvae either cease to exist after metamorphosis or assume other roles in the adult stage. Similar to the scenario where Cajal-Retzius cells and GABAergic interneurons establish transient synaptic connections with entorhinal axons and commissural axons, respectively, these cells form a transient circuit with presynaptic targets and subsequently undergo cell death during development. In our model, the neurons that synapse with s-LNvs in early development serve as "placeholders," offering positive or negative cues to guide the axonal targeting of s-LNvs towards their ultimate destination.

      Thank you again for your valuable feedback, and we have incorporated these considerations into our revised manuscript to enhance the clarity and depth of our research.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      In the introduction too many revisions are cited and very few actual research papers. This should be corrected and the most significant papers in the field should be cited. For example, there is no reference to the pioneering work from the Christine Holt lab or the first paper looking at axon guidance and guideposts by Klose and Bentley, Isbister et al 1999.

      The introduction should encapsulate the actual knowledge based on actual research papers.

      We acknowledge your concern regarding the citation of review papers rather than primary research papers in the introduction. Following your suggestion, we have revised the introduction section to incorporate references to relevant research papers.

      In the introduction and discussion: The authors cite revisions where the signals that guide axons across different regions including turning are shown and they end up saying: "However, how the axons change their projection direction without well-defined landmarks is still unclear." I think the sentence should be changed. Many things are still not clear but this is not a good phrasing. Maybe they could focus on their temporal finding?

      We appreciate the reviewer's feedback and insightful suggestions. We agree that emphasizing the temporal aspect is crucial in our study. However, we also recognize the significance of understanding the origin of signals that guide axonal reorientation at specific locations. While axonal projections navigating without distinct landmarks pose more challenges and uncertainties compared to those guided by prominent landmarks like the midline, our research demonstrates the crucial role of a specific cell population near turning points in providing accurate guidance cues to ensure precise axonal reorientation. We have revised our phrasing in the introduction and discussion to better reflect these key points (see revised manuscript lines 69-71 and 350-354). Thank you for highlighting the significance of focusing on our temporal findings and the complexities involved in studying axonal projection.

      Many rather old papers have looked into the effect of repulsive guideposts to guide axon projections. In particular, I can think of the paper from Isbister et al. 1999 (DOI: 10.1242/dev.126.9.2007) that not only shows how semaphoring guides Ti axon projection but also shows how the pattern of expression of sema 2a changes during development to guide the correct projection. I really think that the novelty of the paper should be revised in light of the actual knowledge in the field.

      We appreciate the reviewer's reference to the seminal work by Isbister et al. (1999) and the importance of guidepost cells in axon projection guidance, which we have already cited in our revised manuscript. It is crucial to recognize that segmented patterns such as the limb segment traversed by Ti1 neuron projections or neural circuits formed in a layer- or column-specific manner also serve as intrinsic "guideposts," offering valuable insights into axonal pathfinding processes. In our model, explicit guidance cues are lacking. As highlighted, our key contribution lies in elucidating how axonal projections without clear landmarks are guided, with our research demonstrating how a newly formed cluster of cells at a specific time and location provides the necessary guidance cues for axons (see revised manuscript lines 350-354). We have ensured that our revised manuscript reflects these insights and emphasizes the significance of studying axonal guidance in the absence of distinct guideposts. Thank you for underscoring these essential points, which enhance our understanding of axonal projection dynamics.

      Minors:

      Line 54, the authors start talking about floorplate at the end of a section on Drosophila. Please use “In vertebrates”, or “in invertebrates” or “in Drosophila” etc.. when needed to put things in context.

      We thank the reviewer for this suggestion and have modified this sentence. Please refer to lines 62-63 of the revised manuscript.

      Line 69: many factors change the axonal outgrowth. The authors are missing the paper from Fernandez et al. 2020, who have shown that unc5 the receptor of netrin induces the stalling for sLNvs projections before the turn. https://doi.org/10.1016/j.cub.2020.04.025

      We thank the reviewer for this suggestion and have added this research article. Please refer to line 79 of the revised manuscript.

      Line 99: "precisely at the pivotal juncture". It I hard to see how it was done in the figures shown. Can the authors add a small panel with neuronal staining showing this (please no HRP)?

      For all figures, tee magenta is too strong and it is really hard to see the sLNvs projections. Can this be sorted, please?

      We have depicted the pivotal juncture in the schematic diagram on the left side of Figure 1C. Additionally, we have included a separate column of images without HRP in Figure 1A. Moreover, we have modified the pseudo-color of HRP from magenta to blue to enhance the visualization of the s-LNv projection. The figure legends have also been correspondingly modified.

      Line 407: Spatial position relationship between calyx and s-LNvs. OK107-GAL4 labels ... calyx and s-LNvs labeled by, which which.

      We have modified it according to your suggestion. Please refer to lines 430-432 of the revised manuscript.

      Line 137 typo RPRC

      We thank the reviewer for noticing this mistake, which has now been corrected. Please refer to line 148-149 of the revised manuscript.

      Section 158-164. the paper from Zhang et al 2019 needs to be cited since they have found the same effect of decreasing Dscam even if they didn't think about horizontal projection.

      Thanks to the suggestion, we have included in the manuscript the phenotype observed by Zhang et al. (2019) upon knocking down Dscam1-L in adults. Please refer to lines 170-172 of the revised manuscript.

      Line 176: typo senses (instead of sensor).

      Thank you for pointing out our mistake. We have modified it according to your suggestion. Please refer to line 189 of the revised manuscript.

      Line 193: more than Interesting it is Notable. Add "ubiquitus" knockdown.

      Thank you for the suggestion. We have included the word "ubiquitus" to enhance the precision of the narrative. Please refer to line 206 of the revised manuscript.

      Line 224: the pattern of expression of the crz cells is not visible where the projections of sLNvs are located. Are they in that region? Or further away?

      We've changed the pseudo-color of HRP, and in the updated Figure 5- figure supplement 1, you can see the projection pattern of crz+ cells, positioned close to the end of the s-LNv axon terminal.

      Line 243: applied? Do you mean "used"

      Thank you for the suggestion. We have revised it at line 256.

      Figure 5 Sup1: the schematic shows DNs proliferation that is not visible on the GFP image. Please comment.

      We have modified the Figure 5 figure supplementary 1 for 120 h per-GAL4, Pdf-GAL80 >GFP expression pattern. Due to the strong GFP intensity in some DN neurons, there was a loss of GFP signal. Additionally, in Figure 6 figure supplementary 1, we have added co-localization images of DN and s-LNv at 72 h and 96 h. To better illustrate the co-localization information, we have shown only a portion of the layers in the right panel. We hope these additions clarify your concerns.

      Line 251: cite Fernandez et al. 2020 with Purohit et al 2012.

      We have modified it according to your suggestion. Please refer to line 264 of the revised manuscript.

      Line 272: you have not shown synergistic effects because you have not modulated both pathways at the same time. You should talk about complementary.

      We have modified it according to your suggestion at lines 25, 285, 439.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Point for more elaborate discussion: Apparently the timescale of negative feedback signals is conserved between endothelial cell migration in vitro (with human cells) and endothelial migration during the formation of ISVs in zebrafish. What do you think might be an explanation for such conserved timescales? Are there certain processes within cytoskeletal tension build up that require this quantity of time to establish? Or does it relate to the time that is needed to begin to express the YAP/TAZ target genes that mediate feedback?

      The underlying mechanisms responsible for the conserved timescale is a major direction that we continue to explore. Localization of YAP/TAZ to the nucleus is likely not rate-limiting. We showed previously that acute RhoA activation produced significant YAP/TAZ nuclear localization within minutes, while subsequent co-transcriptional activity aligned with the gene expression dynamics observed here (Berlew et al., 2021). We hypothesize that the dynamics of YAP/TAZdependent transcription and the translation of those target genes are rate-limiting for initial feedback loop completion (tic = 4 hours). This is supported by work from us and others in a variety of cell lines showing YAP/TAZ transcriptional responses take place during the first few hours after activation. (Franklin et al., 2020; Mason et al., 2019; Plouffe et al., 2018) While our data identify mediators of initial feedback loop completion, the molecular effectors that determine the timescale of new cytoskeletal equilibrium establishment (teq = 8 hours) remain unclear.

      (2) Do you expect different timescales for slower endothelial migratory processes (e.g. for instance during fin vascular regeneration which takes days)?

      We selected the ISV development model because it exhibits similar migratory kinetics to our previously-explored human ECFC migration in vitro. The comparable kinetics allowed us to study dynamics of the feedback loop in vivo on similar time scales, but we have not explored models featuring either slower or faster dynamics. 

      It would be interesting to test how feedback dynamics are impacted in distinct endothelial migratory processes. Our data suggest that the feedback loop is necessary for persistent migration; however, YAP and TAZ respond to a diversity of upstream regulators in addition to mechanical signals, which might depend on the process of vascular morphogenesis. For example, after fin amputation, inflammation and tissue regeneration may impact the biochemical and mechanical environment experienced by the endothelium. Additionally, cells display different migratory behaviors in ISV morphogenesis compared to fin regeneration. During ISV formation, sprouting tip cells migrate dorsally through avascular tissue, followed by stalk cells. (Ellertsdóttir et al., 2010) In contrast, the fin vasculature regenerates by forming an intermediate vascular plexus, where some venous-derived endothelial cells migrate towards the sprouting front, while others migrate against it. (Xu et al., 2014) We are excited to study the role of this feedback loop in these different modes of neovessel formation in future studies.

      (3) Is the ~4hrs and 8hrs feedback time window a general property or does it differ between specific endothelial cell types? In the veins the endothelial cells generate less stress fibers and adhesions compared to in the arteries. Does this mean that there might be a difference in the feedback time window, or does that mean that certain endothelial cell types may not have such YAP/TAZcontrolled feedback system?

      Recent studies suggest that venous endothelial cells are the primary endothelial subtype responsible for blood vessel morphogenesis. (Lee et al., 2022, 2021; Xu et al., 2014) They are highly motile and mechanosensitive, migrating against blood flow. (Lee et al., 2022) The Huveneers group has shown that the actin cytoskeleton is differently organized in adult arteries and veins in response to biomechanical properties of its extracellular matrix, rather than intrinsic differences between arterial and venous cells. (van Geemen et al., 2014) This suggests that arterial and venous cells have distinct cytoskeletal setpoints due to mechanical cues in their environment (Price et al., 2021). We expect this to impact the degree of cytoskeletal remodeling and cell migration at equilibrium, rather than the kinetics of the feedback loop per se, though we have not yet tested this hypothesis. Testing these predictions on cytoskeletal setpoint stability and adaptation is a major direction that we continue to explore. 

      (4) The experiments are based on perturbations to prove that transcriptional feedback is needed for endothelial migration. What would happen if the feedback systems is always switched on? An experiment to add might be to analyse the responsiveness of endothelial cells expressing constitutively active YAP/TAZ.

      This is a problem that we are actively pursuing. Though the feedback system forms a coherent loop, we anticipate that the identity of the node of the loop selected for constitutive activation will influence the outcome, depending on whether that node is rate-limiting for feedback kinetics and the extent of intersection of that node with other signaling events in the cell. For example, we have observed that constitutive YAP activation drives profound changes to the transcriptional landscape including, but not limited to, RhoA signaling (Jones et al., 2023). We further anticipate that constitutive activation of feedback loop nodes may alter feedback dynamics, while dynamic or acute perturbation will be required to dissect these contributions in real time. For these reasons, ongoing work in the lab is pursuing these questions using optogenetic tools that enable precise spatial and temporal control (Berlew et al., 2021).   

      (5) To investigate the role of YAP-mediated transcription in an accurate time-dependent manner the authors may consider using the recently developed optogenetic YAP translocation tool: https://doi.org/10.15252/embr.202154401

      We are enthusiastic about the power of optogenetics to interrogate the nodes and timescales of this feedback system, and we are now funded to pursue this line of research. 

      Reviewer #2:

      The idea is intriguing, but it is not clear how the feedback actually works, so it is difficult to determine if the events needed could occur within 4 hrs. Specifically, it is not clear what gene changes initiated by YAP/TAZ translocation eventually lead to changes in Rho signaling and contractility. Much of the evidence to support the model is preliminary. Some of the data is consistent with the model, but alternative explanations of the data are not excluded. The fish washout data is quite interesting and does support the model. It is unclear how some of the in vitro data supports the model and excludes alternatives.

      Major strengths:

      The combination of in vitro and in vivo assessment provides evidence for timing in physiologically relevant contexts, and a rigorous quantification of outputs is provided. The idea of defining temporal aspects of the system is quite interesting.

      Major weaknesses:

      The evidence for a "loop" is not strong; rather, most of the data can also be interpreted as a linear increase in effect with time once a threshold is reached. Washout experiments are key to setting up a time window, yet these experiments are presented only for the fish model. A major technical challenge is that siRNA experiments take time to achieve depletion status, making precise timing of events on short time scales problematic. Also, Actinomycin D blocks most transcription so exposure for hours likely leads to secondary and tertiary effects and perhaps effects on viability. No RNA profiling is presented to validate proposed transcriptional changes.

      We thank the reviewer for these helpful suggestions. We have expanded our explanation of the history and known mediators of the feedback loop in the introduction. We and, independently, the Huveneers group recently reported that human endothelial cells maintain cytoskeletal equilibrium for persistent motility through a YAP/TAZ-mediated feedback loop that modulates cytoskeletal tension. (Mason et al., 2019; van der Stoel et al., 2020) Because YAP and TAZ are activated by tension of the cytoskeleton (Dupont et al., 2011), suppression of cytoskeletal tension by YAP/TAZ transcriptional target genes constitutes a negative feedback loop (Fig. 1A). We described key components of this cell-intrinsic feedback loop, which acts as a control system to maintain cytoskeletal homeostasis for persistent motility via modulation of Rho-ROCK-myosin II activity. (Mason et al., 2019) Both we and the Huveneers group found that perturbation of genes and pathways regulated by YAP/TAZ mechanoactivation can functionally rescue motility in YAP/TAZ-depleted cells (e.g., RhoA/ROCK/myosin II, NUAK2, DLC1). (Mason et al., 2019; van der Stoel et al., 2020) We further showed previously that both YAP/TAZ depletion and acute YAP/TAZ-TEAD inhibition consistently increased stress fiber and FA maturation and arrested cell motility, accounting for these limitations of siRNA. (Mason et al., 2019)

      Enduring limitations to the temporal, spatial, and cell-specific control of the genetic and pharmacologic methods have inspired us to initiate alternative approaches, which are the subject of ongoing efforts. Further research will be necessary in the zebrafish to determine the extent to which the observed migratory dynamics are driven by cytoskeletal arrest. 

      To identify early YAP/TAZ-regulated transcriptional changes, we have added RNA profiling of control and YAP/TAZ depleted cells cultured on stiff matrices for four hours. Genes upregulated by YAP/TAZ depletion were enriched for Gene Ontology (GO) terms associated with Rho protein signal transduction, vascular development, cellular response to vascular endothelial growth factor (VEGF) stimulus, and endothelial cell migration (Fig. 9B). These data support a role for YAP and TAZ as negative feedback mediators that maintain cytoskeletal homeostasis for endothelial cell migration and vascular morphogenesis.  

      Reviewer #3:

      The authors used ECFC - endothelial colony forming cells (circulating endothelial cells that activate in response to vascular injury).

      Q: Did the authors characterize these cells and made sure that they are truly endothelial cells - for example examine specific endothelial markers, arterial-venous identity markers & Notch signalling status, overall morphology etc prior to the start of the experiment. How were ECFC isolated from human individuals, are these "healthy" volunteers - any underlying CVD risk factors, cells from one patient or from pooled samples, what injury where these humans exposed to trigger the release of the ECPFs into the circulation, etc. The materials & methods on ECFC should be expanded.

      Human umbilical cord blood-derived ECFCs were isolated at Indiana University School of Medicine and kindly provided by Dr Mervin Yoder. Cells were cultured as described by the Yoder group (Rapp et al., 2011) and our prior paper (Mason et al., 2019). We have expanded the materials and methods section to describe the source and characterization of these cells.

      The authors suggest that loss of YAP/TAZ phenocopies actinomycin-D inhibition - "both transcription inhibition and YAP/TAZ depletion impaired polarization, and induced robust ventral stress fiber formation and peripheral focal adhesion maturation". However, the cell size of actinomycin-D treated cells (Fig. 1B, top right panel), differs from the endothelial cell size upon siYAP/TAZ (Fig. 1E, top right panel) - and vinculin staining seems more pronounced in actinomycin-D treated cells (B, bottom right) when compared to siYAP/TAZ group. Cell shape is defined by acto-myosin tension.

      Q: Besides Fraction of focal adhesion >1um; focal adhesion number did the authors measure additional parameters related to cytoskeleton remodelling / focal adhesions that can substantiate their statement on similarity between loss of YAP/TAZ and actinomycin-D treatment. Would it be possible to make a more specific genetic intervention (besides YAP/TAZ) interfering with the focal adhesion pathway as opposed to the broad spectrum inhibitor actinomyocin-D.

      Our previous paper (Mason et al., 2019) delineated the mechanistic relationships between YAP/TAZ signaling, focal adhesion turnover, actomyosin polymerization, and the intervening mechanisms of myosin regulation. Specifically, we demonstrated that YAP/TAZ regulate the myosin phosphatase kinase, NUAK2, and ARHGAP genes to mediate this feedback. Expanding on this work, the current study aimed to define the temporal kinetics of the cytoskeletal mechanotransductive feedback in vitro and in vivo. We used actinomycin-D and YAP/TAZ depletion to interrogate the role of transcriptional regulation and YAP/TAZ signaling, respectively. In this revision, we have added RNA profiling that identifies early YAP/TAZ-regulated transcriptional changes and further points to other molecular mediators of focal adhesions (e.g. TRIO, RHOB, THBS1) that will be the subjects of future studies.    

      Q: Does the actinomycin-D treatment affect responsiveness to Vegf? induce apoptosis or reduce survival of the ECFC?

      We have not looked specifically at the effect of actinomycin-D treatment on responsiveness to VEGF. However, actinomycin-D has been reported to reduce transcription of VEGF receptors (E et al., 2012). In contrast, we found that YAP/TAZ depletion upregulated GO terms associated with endothelial cell migration and response to VEGF stimulus (Fig. 9B), as well as receptors to angiogenic growth factors, including KDR and FLT4 (Fig. 9E). These results suggest YAP/TAZ depleted cells may be more sensitive to VEGF stimulation but remain nonmotile due to cytoskeletal arrest.

      We showed previously that long-term treatment with actinomycin-D reduces ECFC survival (Mason et al., 2019).

      Q: Which mechanism links ECM stiffness with endothelial surface area in the authors scenario. In zebrafish, activity of endothelial guanine exchange factor Trio specifically at endothelial cell junctions (Klems, Nat Comms, 2020) and endoglin in response to hemodynamic factors (Siekmann, Nat Cell Biol 2017) have been show to control EC shape/surface area - do these factors play a role in the scenario proposed by the authors.

      Our new transcriptional profiling indicates both Trio and endoglin are regulated through YAP and TAZ in human ECFCs. We plan to follow up on these findings.

      Q: The authors report that EC migrate faster on stiff substrate, and concomitantly these cells have a larger surface area. What is the physiological rationale behind these observations. Did the authors observe such behaviors in their zebrafish ISV model? How do these observations integrate with the tip - stalk cell shuffling model (Jakobsson & Gerhardt, Nat Cell Biol, 2011) and Notch activity in developing ISVs.

      This question raises important distinctions between the mode of migration in ISV morphogenesis and endothelial cells adherent to substrates. Cells behave and respond to mechanical cues differently in 2D vs. 3D matrices. (LaValley and Reinhart-King, 2014) Additionally, the microenvironment in vivo is much more complex, combining numerous biochemical signals and changing mechanical properties. (Whisler et al., 2023) We are actively investigating the downstream targets of YAP/TAZ mechanotransduction and how that integrates with other pathways known to regulate vascular morphogenesis, such as Notch signaling. 

      The authors examined the formation of arterial intersegmental vessels in the trunk of developing zebrafish embryos in vivo. They used a variety of pharmacological inhibitors of transcription and acto-myosin remodelling and linked the observed morphological changes in ISV morphogenesis with changes in endothelial cell motility.

      Q: Reduced formation and dorsal extension of ISVs may have several reasons, including reduced EC migration and proliferation. The Tg(fl i1a:EGFP) reporter however is not the most suitable line to monitor migration of individual endothelial cells. Can the authors repeat the experiments in Tg(fl i1a:nEGFP); Tg(kdrl:HRAS-mCherry) double transgenics to visualize movement-migration of the individual endothelial cells and EC proliferation events, in the different treatment regimes.

      So far, we have not tracked individual endothelial cells during ISV morphogenesis. We agree this is the best approach and are pursuing a similar technique for these experiments.

      ISV formation is furthermore affected by Notch signalling status and a series of (repulsive) guidance cues.

      Q: Does de novo blockade of gene expression with Actinomycin D affect Notch signalling status, expression of PlexinD - sFlt1, netrin1 or arterial-venous identify genes.

      While we have not performed gene expression analysis under the Actinomycin D condition, Actinomycin D functions as a broad transcription inhibitor. We are currently pursuing the downstream targets of YAP/TAZ mechanotransduction in both ECFCs and zebrafish.

      Remark: The authors may want to consider using the Tg(fl i1:LIFEACT-GFP) reporter for in vivo imaging of actin remodelling events.

      We thank the reviewer for their helpful suggestion.

      Remark: the authors report "As with broad transcription inhibition, in situ depletion of YAP and TAZ by RNAi arrested cell motility, illustrated here by live-migration sparklines over 10 hours: siControl: , siYAP/TAZ: (25 μm scale-bar: -)". Can the authors make a separate figure panel for this, how many cells were measured?

      Please refer to our previous publication for the complete details on this data (Mason et al., 2019). We have added the citation in the text.

      Remark: in the wash-out experiments, exposure to the inhibitors is not the same in the different scenarios - could it be that the longer exposure time induces "toxic" side effect that cannot be "washed out" when compared to the short treatment regimes?

      This is a possible limitation of the pharmacological approach and have included it in the discussion section. We are currently exploring alternative approaches to interrogate the timescale of the feedback loop more precisely.  

      References

      Berlew EE, Kuznetsov IA, Yamada K, Bugaj LJ, Boerckel JD, Chow BY. 2021. Single-Component Optogenetic Tools for Inducible RhoA GTPase Signaling. Advanced Biology 5:2100810. doi:10.1002/adbi.202100810

      Dupont S, Morsut L, Aragona M, Enzo E, Giulitti S, Cordenonsi M, Zanconato F, Le Digabel J,Forcato M, Bicciato S, Elvassore N, Piccolo S. 2011. Role of YAP/TAZ in mechanotransduction. Nature 474:179–183. doi:10.1038/nature10137

      E G, Cao Y, Bhattacharya S, Dutta S, Wang E, Mukhopadhyay D. 2012. Endogenous Vascular Endothelial Growth Factor-A (VEGF-A) Maintains Endothelial Cell Homeostasis by Regulating VEGF Receptor-2 Transcription. J Biol Chem 287:3029–3041. doi:10.1074/jbc.M111.293985

      Ellertsdóttir E, Lenard A, Blum Y, Krudewig A, Herwig L, Affolter M, Belting H-G. 2010. Vascular morphogenesis in the zebrafish embryo. Developmental Biology, Special Section: Morphogenesis 341:56–65. doi:10.1016/j.ydbio.2009.10.035

      Franklin JM, Ghosh RP, Shi Q, Reddick MP, Liphardt JT. 2020. Concerted localization-resets precede YAP-dependent transcription. Nat Commun 11:4581. doi:10.1038/s41467-02018368-x

      Jones DL, Hallström GF, Jiang X, Locke RC, Evans MK, Bonnevie ED, Srikumar A, Leahy TP, Nijsure MP, Boerckel JD, Mauck RL, Dyment NA. 2023. Mechanoepigenetic regulation of extracellular matrix homeostasis via Yap and Taz. Proceedings of the National Academy of Sciences 120:e2211947120. doi:10.1073/pnas.2211947120

      LaValley DJ, Reinhart-King CA. 2014. Matrix stiffening in the formation of blood vessels. Advances in Regenerative Biology 1:25247. doi:10.3402/arb.v1.25247

      Lee H-W, Shin JH, Simons M. 2022. Flow goes forward and cells step backward: endothelial migration. Exp Mol Med 54:711–719. doi:10.1038/s12276-022-00785-1

      Lee H-W, Xu Y, He L, Choi W, Gonzalez D, Jin S-W, Simons M. 2021. Role of Venous Endothelial Cells in Developmental and Pathologic Angiogenesis. Circulation 144:1308–1322. doi:10.1161/CIRCULATIONAHA.121.054071

      Mason DE, Collins JM, Dawahare JH, Nguyen TD, Lin Y, Voytik-Harbin SL, Zorlutuna P, Yoder MC, Boerckel JD. 2019. YAP and TAZ limit cytoskeletal and focal adhesion maturation to enable persistent cell motility. Journal of Cell Biology 218:1369–1389. doi:10.1083/jcb.201806065

      Plouffe SW, Lin KC, Moore JL, Tan FE, Ma S, Ye Z, Qiu Y, Ren B, Guan K-L. 2018. The Hippo pathway effector proteins YAP and TAZ have both distinct and overlapping functions in the cell. J Biol Chem 293:11230–11240. doi:10.1074/jbc.RA118.002715

      Price CC, Mathur J, Boerckel JD, Pathak A, Shenoy VB. 2021. Dynamic self-reinforcement of gene expression determines acquisition of cellular mechanical memory. Biophysical Journal 120:5074–5089. doi:10.1016/j.bpj.2021.10.006

      Rapp BM, Saadatzedeh MR, Ofstein RH, Bhavsar JR, Tempel ZS, Moreno O, Morone P, Booth DA, Traktuev DO, Dalsing MC, Ingram DA, Yoder MC, March KL, Murphy MP. 2011. Resident Endothelial Progenitor Cells From Human Placenta Have Greater Vasculogenic Potential Than Circulating Endothelial Progenitor Cells From Umbilical Cord Blood. Cell Med 2:85–96. doi:10.3727/215517911X617888

      Tammela T, Zarkada G, Nurmi H, Jakobsson L, Heinolainen K, Tvorogov D, Zheng W, Franco CA, Murtomäki A, Aranda E, Miura N, Ylä-Herttuala S, Fruttiger M, Mäkinen T, Eichmann A, Pollard JW, Gerhardt H, Alitalo K. 2011. VEGFR-3 controls tip to stalk conversion at vessel fusion sites by reinforcing Notch signalling. Nat Cell Biol 13:1202–1213. doi:10.1038/ncb2331

      van der Stoel M, Schimmel L, Nawaz K, van Stalborch A-M, de Haan A, Klaus-Bergmann A, Valent ET, Koenis DS, van Nieuw Amerongen GP, de Vries CJ, de Waard V, Gloerich M, van Buul JD, Huveneers S. 2020. DLC1 is a direct target of activated YAP/TAZ that drives collective migration and sprouting angiogenesis. Journal of Cell Science 133:jcs239947. doi:10.1242/jcs.239947

      van Geemen D, Smeets MWJ, van Stalborch A-MD, Woerdeman LAE, Daemen MJAP, Hordijk PL, Huveneers S. 2014. F-Actin–Anchored Focal Adhesions Distinguish Endothelial Phenotypes of Human Arteries and Veins. Arteriosclerosis, Thrombosis, and Vascular Biology 34:2059–2067. doi:10.1161/ATVBAHA.114.304180

      Whisler J, Shahreza S, Schlegelmilch K, Ege N, Javanmardi Y, Malandrino A, Agrawal A, Fantin A, Serwinski B, Azizgolshani H, Park C, Shone V, Demuren OO, Del Rosario A, Butty VL, Holroyd N, Domart M-C, Hooper S, Szita N, Boyer LA, Walker-Samuel S, Djordjevic B, Sheridan GK, Collinson L, Calvo F, Ruhrberg C, Sahai E, Kamm R, Moeendarbary E. 2023. Emergent mechanical control of vascular morphogenesis. Science Advances 9:eadg9781. doi:10.1126/sciadv.adg9781

      Xu C, Hasan SS, Schmidt I, Rocha SF, Pitulescu ME, Bussmann J, Meyen D, Raz E, Adams RH, Siekmann AF. 2014. Arteries are formed by vein-derived endothelial tip cells. Nat Commun 5:5758. doi:10.1038/ncomms6758

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors): 

      Major points about revised manuscript 

      (1) While I acknowledge that the Laccase2 vector is probably the best available in terms of its clean circRNA-expression potential, the authors still lack an estimation of the circRNA overexpression efficiency, specifically the circular-to-linear expression ratio. In their second rebuttal letter, the authors argue that they do not have the option to use another probe and that they are limited by the Backsplicing junction (BSJ)-specific one. I assume they mean that such a probe might only partially hybridize with the linear form and therefore give a poor or no signal in the Northern blot. However, in this referee's opinion, it is precisely because of this limitation that the authors should have used another probe against both the linear and circular RNAs to simultaneously and quantitatively detect both isoforms. This would have allowed them to reliably estimate a circular-to-linear ratio. Perhaps the linear isoform is indeed not expressed or is very low for this circRNA overexpression vector, but the probe used by the authors does not prove it. I think that this addition to the manuscript is not strictly necessary at this stage, but it would certainly improve the results.  

      We fully agree with this recommendation. Our efforts to show this using northern blotting was unfortunately unsuccesful due to background signal. To accommodate the question about circ-to-linear ratio, we instead used an RT-qPCR strategy to measure the linear vs circRNA expression derived from the LaccasecircHIPK3 expression constructs/cell lines. To be able to compare obtained results from different amplicons, we measured primer efficiencies (using amplification standard curves – not shown) of two linear Laccase version amplicons and our divergent primers targeting circHIPK3, which were found to be directly comparable. Using these primer sets in RT-qPCR on the same RNA preparation (total cellular RNA) from the northern blot (Supplementary figure S5H) revealed a ~4 fold higher expression of circHIPK3 compared to linear precursor RNA (Supplementary Figure S5I). 

      This demonstrates that the Laccase vector system efficiently produces circHIPK3 RNA as expected. 

      The few changes to the manuscript (results section text and reference to Supplementary Figure S5I) has been highlighted in yellow. The materials and methods section and Table S1 have been modified to include description of RTqPCR and specific primers.

  3. Jun 2024
    1. Author response:

      The following is the authors’ response to the current reviews.

      Joint Public Review:

      Xie et al. propose that the asymmetric segregation of the NuRD complex is regulated in a V-ATPase-dependent manner, and plays a crucial role in determining the differential expression of the apoptosis activator egl-1 and thus critical for the life/death fate decision.

      Remaining concerns are the following:

      The authors should provide the point-by-point response to the following issues. In particular, authors should provide clear reasoning as to why they did not address some of the following comments in the previous revisions. The next response should be directly answering to the following concerns.

      (1) Discussion should be added regarding the criticism that NuRD asymmetric segregation is simply a result of daughter cell size asymmetry. It is perfectly fine that the NuRD asymmetry is due to the daughter cell size difference (still the nucleus within the bigger daughter would have more NuRD, which can determine the fate of daughter cells). Once the authors add this clarification, some criticisms about 'control' may become irrelevant.

      We thank the reviewer for this suggestion. We will add the following text in the revised discussion on page 14, line 26:

      “…We cannot rule out the possibility that NuRD asymmetric segregation results from daughter cell size asymmetry. According to this perspective, the nucleus in the larger daughter cell could possess more NuRD, potentially influencing the fate of the daughter cells. However, it is important to note that the nuclear protein histone or the MYST family histone acetyltransferase is equally segregated in daughter cells of different sizes.….”

      (2) ZEN-4 is a kinesin that predominantly associates with the midzone microtubules and a midbody during mitosis. Given that midbodies can be asymmetrically inherited during cell division, ZEN-4 is not a good control for monitoring the inheritance of cytoplasmic proteins during asymmetric cell division. Other control proteins, such as a transcriptional factor that predominantly localizes in the cytoplasm during mitosis and enters into nucleus during interphase, are needed to clarify the concern.

      We clarified the issue of ZEN-4 below:

      The critique assumes that "midbodies can be asymmetrically inherited during cell division." However, this assumption does not apply to our study of Q cell asymmetric divisions. In our earlier research, we demonstrated that midbodies in Q cells are released post-division and subsequently engulfed by surrounding epithelial cells (Chai et al., Journal of Cell Biology, 2012). Moreover, we have shown that midbodies from the first cell division in C. elegans embryos are also released and engulfed by the P1 cell (Ou et al., Cell Research, 2013). Therefore, the notion of midbody asymmetric inheritance is irrelevant to this manuscript. Additionally, our manuscript already presents the example of the MYST family histone acetyltransferase, illustrating a nuclear protein that predominantly localizes in the cytoplasm during mitosis and symmetrically enters the nucleus during interphase.

      As for pHluorin experiments, symmetric inheritance of GFP and mCherry is not an appropriate evidence to estimate the level of pHluorin during asymmmetric Q cell division. This issue remains unsolved.

      We acknowledge the limitation of pHluorin in measuring the pH level in a living cell. Future studies could be performed to measure the dynamics of pH levels when advanced tools are available.

      (3) Q-Q plot (quantile-quantile plot) in Figure S10 can be used for visually checking normality of the data, but it does not guarantee that the distribution of each sample is normal and has the standard deviation compared with the other samples. I recommend the authors to show the actual statistical comparison P-values for each case. The authors also need to show the number of replicate experiments for each figure panel.

      We thank the reviewer for pointing this out. We will provide P-values for each case and the number of replicate experiments in the revised Figure 5-figure supplement 1 ( corresponding to Figure S10) and the figure legend.

      The authors left inappropriate graphs in the revised manuscript. In Figure 3E, some error bars are disconnected and the other are stuck in the bars. In Figure S4C, LIN-53 in QR.a/p graph shows lines disconnected from error bars.

      We thank the reviewer for pointing this out. We will correct these error bars.

      I am bit confused with the error bars in Figure 2B. Each dot represents a fluorescent intensity ratio of either HDA-1 or LIN-53 between the two daughter cells in a single animal. Plots are shown with mean and SEM, but several samples (for example, the left end) exhibit the SEM error bar very close to a range of min and max. I might misunderstand this graph but am concerned that Figure 2B may contain some errors in representing these data sets. I would like to ask the authors to provide all values in a table format so that the reviewers could verify the statistical tests and graph representation.

      We thank the reviewer for pointing this out. We apologize for the typo in Figure 2B figure legend. We will correct SEM to SD.

      (4) The authors still do not provide evidence that the increase in sAnxV::GFP and Pegl-1gfp or the increase in H3K27ac at the egl-1 gene in hda-1(RNAi) and lin-53(RNAi) animals is not a consequence of global effects on development. Indeed, the images provided in Figure S7B demonstrate that there are global effects in these animals. no causal interactions have been demonstrated.

      We cannot exclude the global effects and have discussed this issue in our previous manuscript on page 9, line 26:

      “...Considering the pleiotropic phenotypes caused by loss of HDA-1, we cannot exclude the possibility that ectopic cell death might result from global changes in development, even though HDA-1 may directly contribute to the life-versus-death fate determination.”

      (5) Figure 4: Due to the lack of appropriate controls for the co-IP experiment (Fig. 4), I remain unconvinced of the claim that the NuRD complex and V-ATPase specifically interact. Concerning the co-IP, the authors now mention that the co-IP was performed three times: "Assay was performed using three biological replicates. Three independent biological replicates of the experiment were conducted with similar results." However, the authors did not use ACT-4::GFP or GFP alone as controls for their co-IP as previously suggested. This is critical considering that the evidence for a specific HDA-1::GFP - V-ATPase interaction is rather weak (compare interactions between HDA-1::GFP and V-ATPase subunits in Fig 4B with those of HDA-1::GFP and subunits of NuRD in Fig S8B).

      We conducted GFP pull-down experiments and MS spectrometric analysis for HDA-::GFP and ACT-4::GFP using identical protocols, yielding consistent results. We agree with the reviewer that in our Western blot, inclusion of ACT-4::GFP is a more effective negative control compared to empty beads.

      (6) Based on Fig 5E, it appears that Bafilomycin treatment causes pleiotropic effects on animals (see differences in HDA-1::GFP signal in the three rows). The authors now state: "Although BafA1-mediated disruption of lysosomal pH homeostasis is recognized to elicit a wide array of intracellular abnormalities, we found no evidence of such pleiotropic effects at the organismal level with the dosage and duration of treatment employed in this study". However, the 'evidence' mentioned is not shown. It is critical that the authors provide this evidence.

      We thank the Reviewer for pointing out this issue. We only checked the viability of the L1 larvae and morphology of animals at the organismal level with the BafA1 dosage and duration of treatment and did not notice any death of the animals and apparent abnormality in morphology (N > 20 for each treatment). However, as the reviewer pointed out, there can be some abnormalities at the cellular level. We thus revised this above description as the following, on page 11, line 27:

      “…Although BafA1-mediated disruption of lysosomal pH homeostasis is recognized to elicit a wide array of intracellular abnormalities, we did not observe any larval deaths and apparent abnormality in morphology at the organismal level (N > 20 for each treatment) at the dose and duration of treatment employed in this study...”


      The following is the authors’ response to the previous reviews.

      eLife assessment

      The authors propose that the asymmetric segregation of the NuRD complex in C. elegans is regulated in a V-ATPase-dependent manner, that this plays a crucial role in determining the differential expression of the apoptosis activator egl-1, and that it is therefore critical for the life/death fate decision in this species. If proven, the proposed model of the V-ATPase-NuRD-EGL-1-Apoptosis cascade would shed light onto the mechanisms underlying the regulation of apoptosis fate during asymmetric cell division, and stimulate further investigation into the intricate interplay between V-ATPase, NuRD, and epigenetic modifications. However, the strength of evidence for this is currently incomplete.

      Public Review:

      Xie et al. propose that the asymmetric segregation of the NuRD complex is regulated in a V-ATPase-dependent manner, and plays a crucial role in determining the differential expression of the apoptosis activator egl-1 and thus critical for the life/death fate decision.

      While the model is very intriguing, the reviewers raised concerns regarding the rigor of the method. One issue is with statistics (either insufficient information or inadequate use of statistics), and second is the concern that the asymmetry observed may be caused by one cell dying (resulting in protein degradation, RNA degradation etc). We recommend that the authors address these issues.

      We extend our sincere thanks to the Editors and Reviewers for their insightful comments on this study.

      Major #1:

      There are still many misleading statements/conclusions that are not rigorously tested or that are logically flawed. These issues must be thoroughly addressed for this manuscript to be solid.

      (1) Asymmetry detected by scRNA seq vs. imaging may not represent the same phenomenon, thus should not be discussed as two supporting pieces of evidence for the authors' model, and importantly each method has its own flaw. First, for scRNA seq, when cells become already egl-1 positive, those cells may be already dying, and thus NuRD complex's transcripts' asymmetry may not have any significance. The data presented in FigS1D, E show that there are lots of genes (6487 out of 8624) that are decreased in dying cells. Thus, it is not convincing to claim that NuRD asymmetry is regulated by differential RNA amount.

      We agree with the reviewer's comment. Indeed, scRNA-seq reveals phenomena different from those observed in protein imaging, and NuRD asymmetry may not be regulated by differential RNA levels. Seven years ago, when we started this project, NuRD asymmetry during asymmetric neuroblast division was unknown. We first found NuRD mRNA asymmetry using scRNA-seq and then NuRD protein asymmetry using fluorescence imaging. We have documented the whole process of discovering NuRD asymmetry, although the asymmetry of NuRD complex transcripts does not necessarily imply protein asymmetry. We have revised statements related to "NuRD asymmetry being regulated by differential RNA amounts" and discussed this issue in the revised manuscript on page 14, line 2:

      " The transcript asymmetry detected by scRNA-seq may not correspond to the protein asymmetry detected by microscopic imaging. Our scRNA-seq data shows that 6487 out of 8624 genes were not detected in egl-1-positive cells, the putative apoptotic cells. Cells that are egl-1 positive may be undergoing apoptosis, rendering the asymmetry of NuRD complex transcripts insignificant in inferring protein asymmetry. Thus, the observed transcript asymmetry of the NuRD subunits between live and dead cells may be coincidental with NuRD protein asymmetry during asymmetric neuroblast division, rather than serving as a regulatory mechanism."

      (2) Regarding NuRD protein's asymmetry, there are still multiple issues. Most likely explanation of their asymmetry is purely daughter size asymmetry. Because one cell is much bigger than the other (3 times larger), NuRD components, which are not chromatin associated, would be inherited to the bigger cell 3 times more than the smaller daughter. Then, upon nuclear envelope reformation, NuRD components will enter the nucleus, and there will be 3 times more NuRD components in the bigger daughter cell. It is possible that this is actually the underling mechanism to regulate gene expression differentially, but this possibility is not properly acknowledged. Currently, the authors use chromatin associated protein (Mys-1) as 'symmetric control', but this is not necessarily a fair comparison. For NuRD asymmetry to be meaningful, an example of protein is needed that is non-chromatin associated in mitosis, distributed to daughter cells proportional to daughter cell size, and re-enter nucleus after nuclear envelope formation to show symmetric distribution. And if daughter size asymmetry is the cause of NuRD asymmetry, other lineages that do not undergo apoptosis but exhibit daughter size asymmetry would also show NuRD asymmetry. The authors should comment on this (if such examples exist, it is fine in that in those cell types, NuRD asymmetry may be used for differential gene expression, not necessarily to induce cell death, but such comparison provides the explanation for NuRD asymmetry, and puts the authors finding in a better context).

      For more than one decade, we have meticulously explored the relationship between protein asymmetry and cell size asymmetry during ACDs of Q cells. A notable example of even protein distribution is the cytokinetic kinesin ZEN-4, as documented in our 2012 publication in the Journal of Cell Biology (Chai et al., JCB, 2012). This study, primarily focusing on the fate of the midbody post-cell division, also showcased the dynamics of GFP-tagged ZEN-4 during ACDs of QR.a cells in movie S1. Intriguingly, beyond its role in the cytokinetic ring, we observed a uniform dispersal of ZEN-4 throughout the cytoplasm. Remarkably, following cell division, ZEN-4 transitions evenly into the nuclei of the daughter cells, a phenomenon with implications yet to be fully understood. One hypothesis is that ZEN-4's nuclear localization may prevent the formation of ectopic microtubule bundles in the cytosol during interphase. Below, we present a snapshot from our original movie, clearly showing the symmetrical distribution of ZEN-4 into the nuclei of the two daughter cells.

      (3) For the analysis of protein asymmetry between two daughters in Fig S4C, the method of calibration is unclear, making it difficult to interpret the results.

      In Figure S4C, we quantified the relative total fluorescence of the Q cell, with the quantification method illustrated in Figure S4A. To further clarify our quantification approach, we have updated Figure S4A and the "Live-Cell Imaging and Quantification" section in the Materials and Methods:

      “…To determine the ratios of fluorescence intensities in the posterior to anterior half (P/A) of Q.a lineages or A/P of Q.p lineages, the cell in the mean intensity projection was divided into posterior and anterior halves. ImageJ software was used to measure the mean fluorescence intensities of two halves with background subtraction. The slide background's mean fluorescence intensity was measured in a region devoid of worm bodies. The background-subtracted mean fluorescence intensities of the two halves were divided to calculate the ratio. The same procedure was used to determine the fluorescence intensity ratios between two daughter cells. Total fluorescence intensity was the sum of the posterior and anterior fluorescence intensities or the sum of fluorescence intensities from two daughter cells (Figure S4A). …”

      (4) As for pHluorin experiments, the authors were asked to test the changes in fluorescence observed are due to changes in pH or changes in the amount of pHluorin protein. They need to add a ratio-metric method in this manuscript. A brief mention to Page 12 line 12 is insufficient to clarify this issue.

      We appreciate the concerns about potential changes in pH or pHluorin protein levels. While we cannot completely dismiss the impact of changes in the amount of pHluorin protein, it appears improbable that the asymmetry of pHluorin fluorescence is attributed to an asymmetric amount of pHluorin protein. This inference is supported by the observation that other fluorescent proteins, such as GFP or mCherry, did not exhibit any asymmetry during ACDs of Q cells. An example of GFP alone during the ACD of QL.p is illustrated in figure 5A from Ou and Vale, JCB, 2009. The fluorescence intensities in the large QL.pa cell and the small QL.aa are indistinguishable.

      Major #2:

      Some issues surrounding statistics must be resolved.

      (1) Fig. 1FG, 2D, 3BDEG, 5BD and 6B used either one-sample t-test or unpaired two-tailed parametric t-test for statistical comparison. These t-tests require a verification of each sample fitting to a normal distribution. The authors need to describe a statistical test used to verify a normal distribution of each sample.

      (2) Fig. 2D, 3D, and 3G have very small sample size (N=3-4, N=6, N=3, respectively), it is possible that a normal distribution cannot be verified. How can the authors justify the use of one-sample t-test and unpaired parametric t-test ?

      (3) Statistical comparison in Fig. 2D and Fig. 6B should be re-assessed. For Fig. 2D, the authors need to compare the intensity ratio of HDA-1/LIN53 between sister cells dying within 35 min and those over 400 min. For Fig. 6B, they need to compare the intensity ratio of VHA-17 between DMSO- and BafA1- treated cells at the same time point after anaphase.

      We appreciate the reviewer's advice on the statistical analysis of our data. In response, we performed normality tests on the datasets presented in Figures 1F, 1G, 3B, 5B, 5D, and 6B, all of which passed the tests (as demonstrated in Figure S10). We also acknowledge the reviewer's comment on the inadequate sample sizes in Figures 2D, 3D, 3E, and 3G for fitting a normal distribution. Therefore, we have revised our statistical analysis methods for these figures and updated both the figures and their legends. The revised statistical results support the primary conclusions of this study.

      In response to the reviewer's observation regarding the small sample size in Figure 2D , which precluded normality verification, and the suggestion to compare sister cells that die within 35 minutes to those surviving over 400 minutes, we adapted our approach. We implemented the Kruskal-Wallis test to evaluate the differences among the groups. To assess the specific differences between each group and the 400 min MSpppaap group, we conducted the Dunn’s multiple comparisons test. The revised Figure 2D illustrates the updated statistical significance.

      For Figure 3D, due to the small sample size precluding normality verification, we applied the Wilcoxon test with 1 as the theoretical median. The revised Figure 3D illustrates the updated statistical significance.

      For Figure 3E, where the sample size also hindered normality verification, we conducted the Kruskal-Wallis test to evaluate the overall effect. Additionally, Dunn’s multiple comparisons test was utilized to examine the differences between groups. The revised Figure 3E illustrates the updated statistical significance.

      For Figure 3G, the reviewer pointed out the small sample size and the limited statistical power due to having only three data points per group. To address this, we revised the figure to visually present each data point, aiming to more clearly illustrate the variation trends.

      For Figure 6B, following the reviewer's suggestion, we compared the DMSO group directly with the Baf A1 group, updating Figure 6B to reflect this comparison as advised.

      These adjustments have been made to ensure the statistical analyses are robust and appropriate given the sample sizes and to align with the reviewer's recommendations, enhancing the clarity and accuracy of our findings.

      Recommendations for the authors:

      We recommend using grey scale (instead of 'heatmap' representation) to show the protein distribution of interest. Heatmap does not help at all, because 'total protein amount per cell' (instead of signal intensity on each pixel) is what matters in the context of this paper. Heatmap presentation does not allow readers to integrate signal intensity with their eyes.

      We thank the editor for pointing this out. We have changed heatmaps to inverted fluorescence images in grey scale.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study presents a valuable tool for searching molecular dynamics simulation data, making such data sets accessible for open science. The authors provide convincing evidence that it is possible to identify useful molecular dynamics simulation data sets and their analysis can produce valuable information.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Tiemann et al. have undertaken an original study on the availability of molecular dynamics (MD) simulation datasets across the Internet. There is a widespread belief that extensive, well-curated MD datasets would enable the development of novel classes of AI models for structural biology. However, currently, there is no standard for sharing MD datasets. As generating MD datasets is energy-intensive, it is also important to facilitate the reuse of MD datasets to minimize energy consumption. Developing a universally accepted standard for depositing and curating MD datasets is a huge undertaking. The study by Tiemann et al. will be very valuable in informing policy developments toward this goal.

      Strengths:

      The study presents an original approach to addressing a growing concern in the field. It is clear that adopting a more collaborative approach could significantly enhance the impact of MD simulations in modern molecular sciences.

      The timing of the work is appropriate, given the current interest in developing AI models for describing biomolecular dynamics.

      Weaknesses:

      The study primarily focuses on one major MD engine (GROMACS), although this limitation is not significant considering the proof-of-concept nature of the study.

      We thank the reviewer for his/her comments. Moving forward, our plan includes expanding this research to encompass other MD engines used in biomolecular simulations and materials sciences, such as NAMD, Charmm, Amber, LAMMPS, etc. However, this requires parsing associated files to supplement the sparse metadata generally available for the related datasets

      Reviewer #2 (Public Review):

      Summary:

      Molecular dynamics (MD) data is deposited in public, non-specialist repositories. This work starts from the premise that these data are a valuable resource as they could be used by other researchers to extract additional insights from these simulations; it could also potentially be used as training data for ML/AI approaches. The problem is that mining these data is difficult because they are not easy to find and work with. The primary goal of the authors was to discover and index these difficult-to-find MD datasets, which they call the "dark matter of the MD universe" (in contrast to data sets held in specialist databases).

      The authors developed a search strategy that avoided the use of ill-defined metadata but instead relied on the knowledge of the restricted set of file formats used in MD simulations as a true marker for the data they were looking for. Detection of MD data marked a data set as relevant with a follow-up indexing strategy of all associated content. This "explore-and-expand" strategy allowed the authors for the first time to provide a realistic census of the MD data in non-specialist repositories.

      As a proof of principle, they analyzed a subset of the data (primarily related to simulations with the popular Gromacs MD package) to summarize the types of simulated systems (primarily biomolecular systems) and commonly used simulation settings.

      Based on their experience they propose best practices for metadata provision to make MD data FAIR (findable, accessible, interoperable, reusable).

      A prototype search engine that works on the indexed datasets is made publicly available. All data and code are made freely available as open source/open data.

      Strengths:

      The novel search strategy is based on relevant data to identify full datasets instead of relying on metadata and thus is likely to have many true positives and few false positives.

      The paper provides a first glimpse at the potential hidden treasures of MD simulations and force field parametrizations of molecules.

      Analysis of parameter settings of MD simulations from how researchers *actually* run simulations can provide valuable feedback to MD code developers for how to document/educate users. This approach is much better than analyzing what authors write in the Methods sections.

      The authors make a prototype search engine available.

      The guidelines for FAIR MD data are based on experience gained from trying to make sense of the data.

      Weaknesses:

      So far the work is a proof-of-concept that focuses on MD data produced by Gromacs (which was prevalent under all indexed and identified packages).

      As discussed in the manuscript, some types of biomolecules are likely underrepresented because different communities have different preferences for force fields/MD codes (for example: carbohydrates with AMBER/GLYCAM using AMBER MD instead of Gromacs).

      Materials sciences seem to be severely under-represented --- commonly used codes in this area such as LAMMPS are not even detected, and only very few examples could be identified. As it is, the paper primarily provides an insight into the *biomolecular* MD simulation world.

      The authors succeed in providing a first realistic view on what MD data is available in public repositories. In particular, their explore-expand approach has the potential to be customized for all kinds of specialist simulation data, whereby specific artifacts are used as fiducial markers instead of metadata. The more detailed analysis is limited to Gromacs simulations and primarily biomolecular simulations (even though MD is also widely used in other fields such as the materials sciences). This restricted view may simply be correlated with the user community of Gromacs and hopefully, follow-up studies from this work will shed more light on this shortcoming.

      The study quantified the number of trajectories currently held in structured databases as ~10k vs ~30k in generalist repositories. To go beyond the proof-of-principle analysis it would be interesting to analyze the data in specialist repositories in the same way as the one in the generalist ones, especially as there are now efforts underway to create a database for MD simulations (Grant 'Molecular dynamics simulation for biology and chemistry research' to establish MDDB' DOI 10.3030/101094651). One should note that structured databases do not invalidate the approach pioneered in this work; if anything they are orthogonal to each other and both will likely play an important role in growing the usefulness of MD simulations in the future.

      We thank the reviewer for his/her comments. As mentioned to Reviewer 1, we intend to extend this work to other MD engines in the near future to go beyond Gromacs and even biomolecular simulations. Furthermore, as the value of accessing and indexing specialized MD databases such as MDDB, MemprotMD, GPCRmd, NMRLipids, ATLAS, and others has been mentioned by the reviewer, it is indeed one of our next steps to continue to expand the MDverse catalog of MD data. This indexing may also extend the visibility and widespreaded adoptability of these specific databases.

      Reviewer #3 (Public Review):

      Molecular dynamics (MD) simulations nowadays are an essential element of structural biology investigations, complementing experiments and aiding their interpretation by revealing transient processes or details (such as the effects of glycosylation on the SARS-CoV-2 spike protein, for example (Casalino et al. ACS Cent. Sci. 2020; 6, 10, 1722-1734 https://doi.org/10.1021/acscentsci.0c01056) that cannot be observed directly. MD simulations can allow for the calculation of thermodynamic, kinetic, and other properties and the prediction of biological or chemical activity. MD simulations can now serve as "computational assays" (Huggins et al. WIREs Comput Mol Sci. 2019; 9:e1393.

      https://doi.org/10.1002/wcms.1393). Conceptually, MD simulations have played a crucial role in developing the understanding that the dynamics and conformational behaviour of biological macromolecules are essential to their function, and are shaped by evolution. Atomistic simulations range up to the billion atom scale with exascale resources (e.g. simulations of SARS-CoV-2 in a respiratory aerosol. Dommer et al. The International Journal of High Performance Computing Applications. 2023; 37:28-44. doi:10.1177/10943420221128233), while coarse-grained models allow simulations on even larger length- and timescales. Simulations with combined quantum mechanics/molecular mechanics (QM/MM) methods can investigate biochemical reactivity, and overcome limitations of empirical forcefields (Cui et al. J. Phys. Chem. B 2021; 125, 689 https://doi.org/10.1021/acs.jpcb.0c09898).

      MD simulations generate large amounts of data (e.g. structures along the MD trajectory) and increasingly, e.g. because of funder mandates for open science, these data are deposited in publicly accessible repositories. There is real potential to learn from these data en masse, not only to understand biomolecular dynamics but also to explore methodological issues. Deposition of data is haphazard and lags far behind experimental structural biology, however, and it is also hard to answer the apparently simple question of "what is out there?". This is the question that Tiemann et al explore in this nice and important work, focusing on simulations run with the widely used GROMACS package. They develop a search strategy and identify almost 2,000 datasets from Zenodo, Figshare and Open Science Framework. This provides a very useful resource. For these datasets, they analyse features of the simulations (e.g. atomistic or coarse-grained), which provides a useful snapshot of current simulation approaches. The analysis is presented clearly and discussed insightfully. They also present a search engine to explore MD data, the MDverse data explorer, which promises to be a very useful tool.

      As the authors state: "Eventually, front-end solutions such as the MDverse data explorer tool can evolve being more user-friendly by interfacing the structures and dynamics with interactive 3D molecular viewers". This will make MD simulations accessible to non-specialists and researchers in other areas. I would envisage that this will also include approaches using interactive virtual reality for an immersive exploration of structure and dynamics, and virtual collaboration (e.g. O'Connor et al., Sci. Adv.4, eaat2731 (2018). DOI:10.1126/sciadv.aat2731)

      The need to share data effectively, and to compare simulations and test models, was illustrated clearly in the COVID-19 pandemic, which also demonstrated a willingness and commitment to data sharing across the international community (e.g. Amaro and Mulholland, J. Chem. Inf. Model. 2020, 60, 6, 2653-2656 https://doi.org/10.1021/acs.jcim.0c00319; Computing in Science & Engineering 2020, 22, 30-36 doi: 10.1109/MCSE.2020.3024155). There are important lessons to learn here, for simulations to be reproducible and reliable, for rapid testing, for exploiting data with machine learning, and for linking to data from other approaches. Tiemann et al. discuss how to develop these links, providing good perspectives and suggestions.

      I agree completely with the statement of the authors that "Even if MD data represents only 1 % of the total volume of data stored in Zenodo, we believe it is our responsibility, as a community, to develop a better sharing and reuse of MD simulation files - and it will neither have to be particularly cumbersome nor expensive. To this end, we are proposing two solutions. First, improve practices for sharing and depositing MD data in data repositories. Second, improve the FAIRness of already available MD data notably by improving the quality of the current metadata."

      This nicely states the challenge to the biomolecular simulation community. There is a clear need for standards for MD data and associated metadata. This will also help with the development of standards of best practice in simulations. The authors provide useful and detailed recommendations for MD metadata. These recommendations should contribute to discussions on the development of standards by researchers, funders, and publishers. Community organizations (such as CCP-BioSim and HECBioSim in the UK, BioExcel, CECAM, MolSSI, learned societies etc) have an important part to play in these developments, which are vital for the future of biomolecular simulation.

      We thank the reviewer for his/her comments. Beyond the points mentioned to Reviewers 1 and 2, as the reviewer suggested, it would be of great interest to combine innovative and immersive approaches to visualize and possibly interact with the data collected. This is indeed more and more amenable thanks to technologies such as WebGL and programs such as Mol*, or even - as also pointed out by the reviewer - through virtual reality, for example with the mentioned Narupa framework or with the UnityMol software. For a comprehensive review on MD trajectory visualization and associated challenges, we refer to our recent review article https://doi.org/10.3389/fbinf.2024.1356659.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Some minor text editing would improve the readability of the manuscript.

      It would be very useful if the authors could share their perspectives on the best and most efficient approach to sharing datasets and code associated with a publication. My concern lies in the fact that Github, which is currently the dominant platform for sharing code, is not well-suited for hosting large MD datasets. As a result, researchers often need to adopt a workflow where code is shared on Github and datasets are stored elsewhere (e.g., Zenodo). While this is feasible, it adds extra work. Ideally, a transparent process could be developed to seamlessly share code and datasets linked to a study through a unified interface.

      We thank the reviewer for this excellent suggestion. To our knowledge, there is yet no easy framework to jointly store and share code and data, linked to their scientific publication. Of course, code can be submitted to “generic” databases along with the data, but at the current state, those do not provide such useful features like collaborative work & track recording as done to the extent of GitHub.

      Although GitHub is indeed a suitable platform to deposit code, we strongly advise researchers to archive their code in Software Heritage. In addition to preserving source code, Software Heritage provides a unique identifier called SWHID that unambiguously makes reference to a specific version of the source code.

      So far, it is the responsibility of the scientific publication authors to link datasets and source codes (whether in GitHub or Software Heritage) in their paper, but also to make the reverse link from the data and code sharing platforms to the paper after publication.

      As mentioned by the reviewer, a unified interface that could ease this process would significantly contribute to FAIR-ness in MD.

      Reviewer #2 (Recommendations For The Authors):

      L180: I am not aware that TRR files contain energy terms as stated here, my understanding was that EDR files primarily served that purpose.

      “…available in one dataset. Interestingly, we found 1,406 .trr files, Which contain trajectory but also additional information such as velocities, energy of the system, etc’ While the file is especially useful in terms of reusability, the large size (can go up to several 100GB) limits its deposition in most…”

      Indeed, our formulation was ambiguous. The EDR files contain the detailed information on energies, whereas TRR files contain numerous values from the trajectory such as coordinates, velocities, forces and to some extent also energies

      (https://manual.gromacs.org/current/reference-manual/file-formats.html#trr)

      L207: The text states that the total time was not available from XTC files, only the number of frames. However, XTC files record time stamps in addition to frame numbers. As long as these times are in the Gromacs standard of picoseconds, the simulation time ought to be available from XTCs.

      “…systems and the number of frames available in the files (Fig. 3-B). Of note, the frames do not directly translate to the simulation runtime - more information deposited in other files (e.g. .mdp files) is needed to determine the complete runtime of the simulation. The system was up…”.

      Thank you for the useful comment, we removed this sentence. We now mention that studying the simulation time would be of interest in the future, especially when we will perform an exhaustive analysis of XTC files.

      “Of note, as .xtc files also contain time stamps, it would be interesting to study the relationship between the time and the number of frames to get useful information about the sampling. Nevertheless, this analysis would be possible only for unbiased MD simulations. So, we would need to decipher if the .xtc file is coming from biased or unbiased simulations, which may not be trivial.”

      Analysis of MDP files: Were these standard equilibrium MD or can you distinguish biased MD or free energy calculations?

      Currently we do not distinguish between biased and unbiased MD, but in the future we may attempt to do so, e.g. by correlating it with standard equilibration force-fields/parameters, timesteps or similar. Nevertheless, a true distinction will remain challenging.

      L336: typo: pikes -> spikes (or peaks?)

      “…simulations of Lennard-Jones models (Jeon et al., 2016). Interestingly, we noticed the appearance of several pikes at 400K, 600K and 800K, which were not present before the end of the year 2022. These peaks correspond to the same study related to the stability of hydrated crystals (Dybeck et al., 2023)’ Overall, thhis analysis revealed that a wide range of temperatures have been explored,…”

      Thank you. We have corrected this typo.

      Make clear how multiple versions of data sets are handled, e.g., if v1, v2, and v3 of a dataset are provided in Zenodo then which one is counted or are all counted?

      We collected the latest version only of datasets, as exposed by default by the Zenodo API. To reflect this, we added the following sentence to the Methods and Materials section, Initial data collection sub-section:

      “By default, the last version of the datasets was collected.”

      L248 Analysis of GRO files seems fairly narrow because PDB files are very often used for exactly the same purpose, even in the context of Gromacs simulations, not the least because it is familiar to structural biologists that may be interested in representative MD snapshots. Despite all the shortcomings of abusing the PDB format for MD, it is an attempt at increased interoperability. Perhaps the authors can make sure that readers understand that choosing GRO for analysis may give a somewhat skewed picture, even within Gromacs simulations.

      Thanks for this comment. We collected about 12,000 PDB files that could indeed be output from Gromacs simulations and easily be shared due to the universality of this format, but that could as well come from different sources (like other MD packages or the PDB database itself). We purposely decided to limit our study to files strictly associated with the Gromacs package, like MDP and XTC file types. However, we will extend our survey to all other structure-like formats and especially the PDB file type. We reflected this purpose in the following sentence (after line 281)

      “Beyond .gro files, we would like to analyze the ensemble of the ~12,000 .pdb files extracted in this study (see Figure 2-B) to better characterize the types of molecular structures deposited.”

      A simple template metadata file would be welcome (e.g., served from a GitHub/GitLab repository so that it can be improved with community input).

      Thank you for this suggestion that we fundamentally agree with. However, the generation of such a file is a major task, and we believe that the creation of a metadata file template requires far-reaching considerations, therefore is beyond the scope of this paper and should not be decided by a small group of researchers. Indeed, this topic requires a large consensus of different stakeholders, from users, to MD program developers, and journal editors. It would be especially useful to organize dedicated workshops with representatives of all these communities to tackle this specific issue, as mentioned by Reviewer3 in his/her public review. As a basis for this discussion, we humbly proposed at the end of this manuscript a few non-constraining guidelines based on our experience retrieving the data.

      To emphasize this statement, we added the following sentence at the end of the “Guidelines for better sharing of MD simulation data” section (line 420):

      “Converging on a set of metadata and format requires a large consensus of different stakeholders from users, to MD program developers, and journal editors. It would be especially useful to organize specific workshops with representatives of all these communities to collectively tackle this specific issue.”

      In "Data and code availability" it would be good to specify licenses in addition to stating "open source". Thank you for pointing out that GitLab/GitHub are not archives and that everyone should be strongly encouraged to submit data to stable archival repositories.

      We added the corresponding licenses for code and data in the “Data and code availability” section.

      Reviewer #3 (Recommendations For The Authors)

      The paper is well written, with very few typographical or other minor errors.

      Minor points:

      Line 468-9 "can evolve being more user-friendly" should be "can evolve to being more user-friendly", I think.

      Thank you, we have changed the wording accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study reports on the packing of molecules in cellular compartments, such as actin-based protrusions. The study provides solid evidence for parameters that enable the building of a biophysical model of filopodia, which is required to gain a complete understanding of these important actin-based structures. Some areas of the manuscript require further clarification.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript proposes an alternative method by SDS-PAGE calibration of Halo-Myo10 signals to quantify myosin molecules at specific subcellular locations, in this specific case filopodia, in epifluorescence datasets compared to the more laborious and troublesome single molecule approaches. Based on these preliminary estimates, the authors developed further their analysis and discussed different scenarios regarding myosin 10 working models to explain intracellular diffusion and targeting to filopodia.

      Strengths:

      Overall, the paper is elegantly written and the data analysis is appropriately presented.

      Weaknesses:

      While the methodology is intriguing in its descriptive potential and could be the beginning of an interesting story, a good portion of the paper is dedicated to the discussion of hypothetical working mechanisms to explain myosin diffusion, localization, and decoration of filopodial actin that is not accompanied by the mandatory gain/loss of function studies required to sustain these claims.

      To be fair, the detailed mechanisms that we raise related to diffusion, localization, and decoration are based on extensive work by others. Many prior papers use domain deletions of Myo10 and fall in the category of gain/loss-of-function studies. It is true that we have not repeated those extensive studies, but it seems appropriate to connect with and cite their work where appropriate.

      Reviewer #2 (Public Review):

      Summary:

      The paper sought to determine the number of myosin 10 molecules per cell and localized to filopodia, where they are known to be involved in formation, transport within, and dynamics of these important actin-based protrusions. The authors used a novel method to determine the number of molecules per cell. First, they expressed HALO tagged Myo10 in U20S cells and generated cell lysates of a certain number of cells and detected Myo10 after SDS-PAGE, with fluorescence and a stained free method. They used a purified HALO tagged standard protein to generate a standard curve which allowed for determining Myo10 concentration in cell lysates and thus an estimate of the number of Myo10 molecules per cell. They also examined the fluorescence intensity in fixed cell images to determine the average fluorescence intensity per Myo10 molecule, which allowed the number of Myo10 molecules per region of the cell to be determined. They found a relatively small fraction of Myo10 (6%) localizes to filopodia. There are hundreds of Myo10 in each filopodia, which suggests some filopodia have more Myo10 than actin binding sites. Thus, there may be crowding of Myo10 at the tips, which could impact transport, the morphology at the tips, and dynamics of the protrusions themselves. Overall, the study forms the basis for a novel technique to estimate the number of molecules per cell and their localization to actin-based structures. The implications are broad also for being able to understand the role of myosins in actin protrusions, which is important for cancer metastasis and wound healing.

      Strengths:

      The paper addresses an important fundamental biological question about how many molecular motors are localized to a specific cellular compartment and how that may relate to other aspects of the compartment such as the actin cytoskeleton and the membrane. The paper demonstrates a method of estimating the number of myosin molecules per cell using the fluorescently labeled HALO tag and SDS-PAGE analysis. There are several important conclusions from this work in that it estimates the number of Myo10 molecules localized to different regions of the filopodia and the minimum number required for filopodia formation. The authors also establish a correlation between number of Myo10 molecules filopodia localized and the number of filopodia in the cell. There is only a small % of Myo10 that tip localized relative to the total amount in the cell, suggesting Myo10 have to be activated to enter the filopodia compartment. The localization of Myo10 is log-normal, which suggest a clustering of Myo10 is a feature of this motor.

      Weaknesses:

      One main critique of this work is that the Myo10 was overexpressed. Thus, the amount in the cell body compared to the filopodia is difficult to compare to physiological conditions. The amount in the filopodia was relatively small - 100s of molecules per filopodia so this result is still interesting regardless of the overexpression. However, the overexpression should be addressed in the limitations.

      This is a reasonable perspective and we now note this caveat in the Limitations section so that readers will take note. Our goal here was to understand a system in which Myo10 is the limiting reagent for filopodia, rather than a native system that expresses high Myo10 on its own. Because U2OS cells do not express detectable levels of Myo10 (see below), the natural perturbation here is overexpressing Myo10 to stimulate filopodial growth.

      The authors have not addressed the potential for variability in transfection efficiency. The authors could examine the average fluorescence intensity per cell and if similar this may address this concern.

      Indeed, cells are heterogenous and will naturally express different levels of Myo10 not only due to transfection efficiency, but also due to their state (cell cycle stage, motile behavior, and more). In fact, we measure the transfection efficiency of each bioreplicate and account for it in our calibration procedure. We also measure the fluorescence intensity per cell, which lets us calculate the total Myo10s per cell and the cell-to-cell variability. These Myo10 distributions across cells are shown in Fig. 1D-E.

      We note here an error that we made in applying this transfection efficiency correction in the first submission. When we obtain the total Myo10 molecules by SDS-PAGE, we should divide by the total number of transfected cells. However, due to an operator precedence error, the transfection efficiency appeared in the numerator rather than the denominator. We have now corrected this error, which has the effect of increasing the number of molecules in all of our measurements. The effect of this correction has strengthened one of the paper’s main conclusions, that Myo10 is frequently overloaded at filopodial tips.

      The SDS PAGE method of estimating the number of molecules is quite interesting. I really like this idea. However, I feel there are a few more things to consider. The fraction of HALO tag standard and Myo10 labeled with the HALO tagged ligand is not determined directly. It is suggested that since excess HALO tagged ligand was added we can assume nearly 100% labeling. If the HALO tag standard protein is purified it should be feasible to determine the fraction of HALO tagged standard that is labeled by examining the absorbance of the protein at 280 and fluorophore at its appropriate wavelength.

      This is a fair point raised by the reviewer, and we have now measured a labeling efficiency of 90% in Supplementary Figure 2A-C. We have adjusted all values according to this labeling efficiency.

      The fraction of HALO tagged Myo10 labeled may be more challenging to determine, since it is in a cell lysate, but there may be some potential approaches (e.g. mass spec, HPLC).

      As noted, this value is considerably more challenging. Instead, we determined conditions under which labeling in cells is saturated. We have now stained with a concentration range for both fixed and live cell samples. Saturation occurs with ~0.5 μM HaloTag ligand-TMR in fixed/permeabilized cells and in live cells (Supplementary Figure 2D-E). This comparison of live cells vs. permeabilized cells allows us to say that the intact plasma membrane is not limiting labeling under these conditions.

      In Figure 1B, the stain free gel bands look relatively clean. The Myo10 is from cell lysates so it is surprising that there are not more bands. I am not surprised that the bands in the TMR fluorescence gel are clean, and I agree the fluorescence is the best way to quantitate.

      Figure 1B shows the focused view at high MW, and there is not much above Myo10. The full gel lanes shown in Supp. Fig. 1C show the expected number of bands from a cell lysate.

      In Figure 3C, the number of Myo10 molecules needed to initiate a filopodium was estimated. I wonder if the authors could have looked at live cell movies to determine that these events started with a puncta of Myo10 at the edge of the cell, and then went on to form a filopodia that elongated from the cell. How was the number of Myo10 molecules that were involved in the initiation determined? Please clarify the assumptions in making this conclusion.

      We thank the reviewer (and the other reviewers) for this excellent suggestion. We have now carried out these live cell experiments. These experiments were quite challenging, because we needed to collect snapshots of ~50 cells to measure the mean fluorescence intensity of transfected cells and then acquire movies of several cells for analysis. The U2OS cells were also highly temperature-sensitive and would retract their filopodia without objective heating.

      We have now analyzed filopodial initiation events and measured considerably more Myo10 at the first signs of accumulation– in the 100s of molecules. The dimmer spots that we measured in the first draft were likely unrelated to filopodial initiation, and we have corrected the discussion on this point.

      We now also track further growth from a stable filopodial tip (the phased-elongation mechanism from Ikebe and coworkers) and find approximately 500 molecules bud off in those events. We also track filopodial elongation rates as a function of Myo10 numbers. We have added additional live cell imaging sections that include these results.

      It is stated in the discussion that the amount of Myo10 in the filopodia exceeds the number of actin binding sites. However, since Myo10 contains membrane binding motifs and has been shown to interact with the membrane it should be pointed that the excess Myo10 at the tips may be interacting with the membrane and not actin, which may prevent traffic jams.

      This is also an excellent point to consider, and we have expanded the relevant discussion along these lines. We agree that the Myo10 at the filopodial tip is likely membrane-bound. We now estimate the 2D membrane area occupied by Myo10, and find that it reaches nearly full packing in many cases (under a number of assumptions that we spell out more fully in the manuscript).

      Reviewer #3 (Public Review):

      Summary:

      The unconventional myosin Myo10 (aka myosin X) is essential for filopodia formation in a number of mammalian cells. There is a good deal of interest in its role in filopodia formation and function. The manuscript describes a careful, quantitative analysis of Myo10 molecules in U2OS cells, a widely used model for studying filopodia, how many are present in the cytosol versus filopodia and the distribution of filopodia and molecules along the cell edge. Rigorous quantification of Myo10 protein amounts in a cell and cellular compartment are critical for ultimately deciphering the cellular mechanism of Myo10 action as well as understand the molecular composition of a Myo10-generated filopodium.

      Consistent with what is seen in images of Myo10 localization in many papers, the vast majority of Myo10 is in the cell body with only a small percentage (appr 5%) present in filopodia puncta. Interestingly, Myo10 is not uniformly distributed along the cell edge, but rather it is unevenly localized along the cell edge with one region preferentially extending filopodia, presumably via localized activation of Myo10 motors. Calculation of total molecules present in puncta based on measurement of puncta size and measured Halo-Myo10 signal intensity shows that the concentration of motor present can vary from 3 - 225 uM. Based on an estimation of available actin binding sites, it is possible that Myo10 can be present in excess over these binding sites.

      Strengths:

      The work represents an important first step towards defining the molecular stoichiometry of filopodial tip proteins. The observed range of Myo10 molecules at the tip suggests that it can accommodate a fairly wide range of Myo10 motors. There is great value in studies such as this and the approach taken by the authors gives one good confidence that the numbers obtained are in the right range.

      Weaknesses:

      One caveat (see below) is that these numbers are obtained for overexpressing cells and the relevance to native levels of Myo10 in a cell is unclear.

      A similar concern was raised by Reviewer 2; please see above.

      An interesting aspect of the work is quantification of the fraction of Myo10 molecules in the cytosol versus in filopodia tips showing that the vast majority of motors are inactive in the cytosol, as is seen in images of cells. This has implications for thinking about how cells maintain this large population in the off-state and what is the mechanism of motor activation. One question raised by this work is the distinction between cytosolic Myo10 and the population found at the ‘cell edge’ and the filopodia tip. The cortical population of Myo10 is partially activated, so to speak, as it is targeted to the cortex/membrane and presumably ready to go. Providing quantification of this population of motors, that one might think of as being in a waiting room, could provide additional insight into a potential step-by-step pathway where recruitment or binding to the cortical region/plasma membrane is not by itself sufficient for activation.

      As mentioned in our response to Reviewer 2, we have now carried out quantitation in live cells to capture Myo10 transitions from cell body into filopodial movement. We attempted to identify this membrane-bound population of motors in our new live cell experiments but were unable to make convincing measurements. Notably, we see no noticeable enrichment of Myo10 at the cortex relative to the cytosol. Although we believe there is a membrane-bound waiting room (akin to the 3D-2D-1D mechanism of Molloy and Peckham), we suspect that the 2D population is diffusing too rapidly to be detected under our imaging conditions.

      Specific comments:

      (1) It is not obvious whether the analysis of numbers of Myo10 molecules in a cell that is ectopically overexpressing Myo10 is relevant for wild type cells. It would appear to be a significant excess based on the total protein stained blot shown in Fig S1E where a prominent band the size of tagged Myo10 seen in the transfected sample is almost absent in the WT control lane.

      Even “wildtype” cells vary considerably in their Myo10 expression levels. For example, melanoma cells often heavily upregulate Myo10, while these U2OS cells produce nearly none (Supplementary Figure 1E). Thus, there is no single, widely acceptable target for Myo10 expression in wildtype cells.

      Please note that the new Supplementary Figure 1E is a Myo10 Western blot, not total protein staining as before.

      Ideally, and ultimately an important approach, would be to work with a cell line expressing endogenously tagged Myo10 via genome engineering. This can be complicated in transformed cells that often have chromosomal duplications.

      Indeed, we chose U2OS cells for this work because they do not express detectable levels of Myo10, and thus we can avoid all of these complications. Here we can examine how Myo10 levels control filopodial production through ectopic expression.

      However, even though there is an excess of Myo10 it would appear that activation is still under some type of control as the cytosolic pool is quite large and its localization to the cell edge is not uniform. But it is difficult to gauge whether the number of molecules in the filopodium is the same as would be seen in untransfected cells. Myo10 can readily walk up a filopodium and if excess numbers of this motor are activated they would accumulate in the tip in large numbers, possibly creating a bulge as and indeed it does appear that some tips are unusually large. Then how would that relate to the normal condition?

      As noted above, the normal condition depends on the cellular system. However, endogenous Myo10 also accumulates in bulges at filopodial tips, so this is not a phenotype unique to Myo10 overexpression. For example, the images from Figure 1 of the Berg and Cheney (2002) citation show bulges from endogenous Myo10 in endothelial cells.

      (2) Measurements of the localization of Myo10 focuses in large part on ‘Myo10 punctae’. While it seems reasonable to presume that these are filopodia tips, the authors should provide readers with a clear definition of a puncta. Is it only filopodia tips, which seems to be the case? Does it include initiation sites at the cell membrane that often appear as punctae?

      We define puncta as any clusters/spots of Myo10 signal detected by segmentation, not limited to any location within the surface-attached filopodia. We exclude puncta that appear in the cell interior (~5 of which appear in Fig. 1A). These are likely dorsal filopodia, but there are few of these compared to the surface attached filopodia of U2OS cells. In Figure 2, “puncta” includes all Myo10 clusters along the filopodia shaft, though a majority happen to be tip-localized (please see Supplementary Figure 4B). We have edited the main text for clarification.

      Along those lines, the position of dim punctae along the length of a filopodium is measured (Fig 3D). The findings suggest that a given filopodium can have more than one puncta which seems at odds if a puncta is a filopodia tip. How frequently is a filopodium with two puncta seen? It would be helpful if the authors provided an example image showing the dim puncta that are not present at the tip.

      We have now provided an example image of dim puncta along filopodia in Supplementary Figure 4C.

      (3) The concentration of actin available to Myo10 is calculated based on the deduction from Nagy et al (2010) that only 4/13 of the actin monomers in a helical turn are accessible to the Myo10 motor (discussion on pg 9; Fig S4). Subsequent work (Ropars et al, 2016) has shown that the heads of the antiparallel Myo10 dimer are flattened, but the neck is rather flexible, meaning that the motor can a variable reach (36 - 52 nm). Wouldn’t this mean that more actin could be accessible to the Myo10 motor than is calculated here?

      Although we see why the reviewer might believe otherwise, the 4/13 fraction of accessible actin holds. This fraction is obtained from consideration of the fascin-actin bundle structure alone, independent of the reach of any particular myosin motor. Every repeating layer of 13 actin subunits (or 36 nm) has 4 accessible myosin binding-sites. The remaining 9 sites are rejected because a single myosin motor domain will have a steric clash with a neighboring actin filament in the bundle. A myosin with an exceptionally long reach might reach the next 13 subunit layer, but that layer also has only 4 binding sites. Thus, we can calculate the number of binding sites per unit length along the filopodium. This number would hold for a dimeric myosin with any reach, including myosin-5 or myosin-2.

      (4) Quantification of numbers of Myo10 molecules in filopodial puncta (Fig 3C) leads the authors to conclude that ‘only ten or fewer Myo10 molecules are necessary for filopodia initiation’ (pg 7, top). While this is a reasonable based on the assumption that the formation of a puncta ultimately results from an initiation event, little is known about initiation events and without direct observation of coalescence of Myo10 at the cell edge that leads to formation of a filopodium, this seems rather speculative.

      As noted above, we have now performed the necessary live cell imaging of filopodial nucleation events and have updated our conclusions accordingly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have made a series of comments that might help the authors improve their manuscript:

      - A full calibration of the methodology would require testing a wider range of protein amounts, to exhaustively detect the dynamic range of the technique. The authors acknowledge in the discussion that “Furthermore, our estimates of molecules are predicated on the calibration curve of the Halo Standard Protein on the SDS-PAGE gels, which is likely the highest source of error on our molecule counts”. A good way of convincing a nasty reviewer is to provide a calibration with more than 3 reference points. At least this will help exclude from the analysis cells where Myo10 estimates are not in the linear regime of detection.

      We completely agree with the reviewer’s suggestion to build a robust calibration curve. The SDS gel shown in Figure 1C originally contained 4 reference points, but the highest HaloTag standard protein point oversaturated the detector at the set exposure in the TMR channel and was omitted. We have now re-run the SDS gel to include a HaloTag standard protein curve comprising 5 points, alongside all three bioreplicates from the fixed cell experiments and all three bioreplicates from the live cell experiments (updated in Figure 1B-C). We had saved frozen lysates from the original fixed cell work, so we were able to reanalyze our data with the new set of standards. The Myo10 quantities are consistent, but with much tighter CIs from the standard curve.

      - As already said this methodology is intriguing, however, a correlative validation with a conventional SMLM approach to address the bona-fide of the method would be ideal.

      Unfortunately, single molecule approaches for validation are impractical for us. Due to the relatively high magnification of our TIRF microscope and the large spread area of the U2OS cells, single cells typically extend beyond the field of view. We acknowledge the benefits of SMLM quantitative techniques and other approaches cited in the introduction section. To avoid use of special tools/instruments, we offer our methodology, based off Pollard group’s quantitative Western blotting of GFP, as a simpler alternative accessible to anyone.

      - TMR is a small ligand likely interacting also with Halo in its denatured state. However, to clear any doubts a parallel Native-PAGE investigation should be included, or if existing a specific reference should be provided.

      Perhaps there is a misunderstanding here. One of the key advantages of the HaloTag labeling system is that the engineered dehalogenase is covalently modified by the ligand (the TMR-ligand is a suicide substrate). This means that the TMR remains bound even under denaturing conditions, which allows its detection in SDS-PAGE. Native gels are unnecessary here.

      - Moreover, SDS-PAGE is run at alkaline pH, have the authors considered these points when designing the methodology? Fluorescence images were taken in PBS, which has a different pH. Could the authors, or the literature, exclude these aspects as potential pitfalls in the methodology? Also temperature is affecting fluorescence emission, but it is easier to control with certain tolerance in the room-temperature regime.

      Our method does not compare fluorescence values that cross the experimental systems (SDS-PAGE vs. microscopy). Cellular proteins and HaloTag protein standards are compared in a single setting of SDS-PAGE to obtain the average number of Myo10s per transfected cell. Likewise, all measurements on intact (live or fixed) cells are conducted in that single setting to obtain average fluorescence per cell. Thus, there is no issue with the different buffers or temperatures affecting fluorescence emission.

      - The authors should test their approach also with truncation variants of Myosin10 (for instance lacking the PH or motor domain). This is a classical approach that might prove the potential of the technique when altering the capacity of the protein to interact with a main binding partner. Also, treatments that induced filopodia formation might prove useful (i.e., hypotonic media induce filopodia formation in some fibroblast cell lines in our hands).

      The reviewer raises interesting suggestions that we aim to address in future experiments, but truncation variants and environmental perturbations are beyond the focus of the current manuscript. Here, we report on the otherwise unperturbed state when we add exogenous full-length Myo10 to the U2OS cells. But indeed, experiments with Myo10 domain truncations, PI3K and PTEN inhibition, and cargo protein / activating cofactor knock-downs (among others) are on our drawing board.

      - Most of the mechanisms hypothesized in the discussion are sound and plausible. However, the authors have chosen an experimental model where transient transfection of exogenous Myo10 in U2OS is performed. This approach poses two main and fundamental questions that are not resolved by the data provided:

      A) how do different expression levels affect the Myo10 counting?

      Our counting procedure does not assume uniform expression across a population of cells– quite the opposite, in fact. We directly measure Myo10 expression levels on a cell-by-cell basis with microscopy, once we know the number of molecules in our total pool (see the Methods for details). As an example of the final output, Figs. 1D and 1E show the total number of Myo10 molecules per cell for fixed and live cells, respectively.

      B) how does endogenous and unlabeled Myo10 hamper the bonafide of counts? The authors claimed “U2OS cells express low levels of Myo10, so there is a small population of unlabeled endogenous Myo10 unaddressed by this paper”. As presented, the low levels of endogenous Myo10 sound an arbitrary parameter, and there are no data presented that can limit if not exclude this bias in the analysis. To produce data in a genetically modified cell line with Halo-tag on the endogenous protein will represent a much cleaner system. Alternatively, the authors should look for Myo10 KO cell lines where they can back-transfect their Halo-Tagged Myo10 construct in a more consistent framework, focusing on cells with low-to-mid levels of expression.

      We agree, this is an important point to nail down (and is often neglected in the literature). We have now measured the endogenous Myo10 levels in U2OS cells by Western blotting and found that it is undetectable compared to our HaloTagged construct expression. Please see Supp. Fig 1E. Thus, for all intents and purposes, every Myo10 molecule in these experiments came from our expression plasmid. Accordingly, we have removed this caveat from the paper.

      Minor points

      - Figure 1B. To help the reader SDS-PAGE gels annotations should be clearer already from the figure.

      We have updated the annotations for clarity.

      - Methods should be organized in sessions. As it stands, it is hard for the reader to look for technical details.

      We have expanded and added subsections to the Methods as requested.

      - The good practice of indicating the gene and transcript entry numbers and the primer used to amplify and clone into the backbone vectors is getting lost in many papers. I would strongly encourage the authors to add this information to the methods.

      We have included the gene entries to the methods and will include a full FASTA file of the coding sequence as supplementary information to avoid any ambiguity here.

      The authors write “It is unclear how myosins navigate to the right place at the right time, but our results support an important interplay between Myo10 and the actin network.” It is a bit scholastic to say that Myo10 and actin have an important interplay, they are major binding partners. What is the new knowledge contained in this sentence?

      Agreed– we have deleted the sentence in question.

      Reviewer #2 (Recommendations For The Authors):

      The authors should address all the weaknesses indicated in the public review.

      There were a few other places that require clarification.

      On page 4, the last paragraph. It is stated that the targeting of Myo10 was reported/proposed based on previous work (ref 31). The next few sentences are not referenced and thus likely refer to ref 31. The authors did not measure the parameters discussed in these sentences, so it is important to clarify that they are referring to previous work and not the current study.

      Indeed, the next few sentences still refer to old reference 31, so we have now edited the paragraph for clarity.

      On page 7, the reference to Figure 3A indicates that the trend of higher Myo10 correlating with more filopodia. However, the reference to Figure 3B indicates total intracellular Myo10 weakly correlates with more filopodia. However, the x-axis on Figure 3B is filopodia molecules not the intracellular Myo10. Please clarify.

      We appreciate the reviewer for catching our mistake. Those plots are now in Fig. 2 and have been edited accordingly.

      Reviewer #3 (Recommendations For The Authors):

      The Discussion of results at the end of each section is rather brief and could be expanded on a bit more.

      Before we were operating under the constraints of an eLife Short Report. We have now expanded the discussion for a full article.

      The authors mention that actin filaments at the tips of filopodia could be frayed, citing Medalia et al, 2007 (ref 40). That paper describes an early cryoEM analysis of filopodia from the amoeba Dictyostelium. EM images of mammalian filopodia tips, e.g. Svitkina et al, 2003, JCB, do not show quite the same organization of actin as seen in the Dictyostelium filopodia tips. However, recent work from the Bershadsky lab, Li et al, 2023, presents a few cryoEM images of tips of left-bent filopodia that are tightly adhered to a substrate and there it looks like actin filaments become disorganized in tips, along with membrane bulging. The authors should consider expanding discussion of the filopodia tips to take into account what is known for mammalian filopodia.

      We thank the reviewer for bringing these enlightening papers to our attention. We have now included these citations in the discussion.

      Fig 1D - The x-axis is a bit odd, it goes from 0 then to 2.5e+06 with no indication of the bin size. Can this be re-labelled or the scale displayed a bit differently?

      We have double-checked the axis breaks, which are large because the underlying values are large. We have also provided the bin size as requested for all histograms.

      Fig 4A - What is the bin size for the histogram?

      As above, we have now updated the figure legends (now in Fig. 3) to include the bin size.

      Methods -

      - Please provide an accession number for the Myo10 nucleotide sequence used for this work as there are at least two known isoforms.

      Thank you for noting this. We are using the full-length, not the headless isoform. We have now updated the Methods accordingly.

      - No mention is made of the SDS sample buffer used, was that also added to the sample?

      We have now updated the Methods accordingly.

      - How are samples boiled at 70 deg C? Do the authors actually mean ‘heated’?

      Indeed. We have now corrected “boiled” to “heated.”

      - Could the authors please briefly explain the connected component analysis used to identify filopodia?

      We have now updated the Methods accordingly.

      - The intensity of filopodia was determined by dividing tip intensity by the total bioreplicate sum of intensities then multiplying it by the total pool, if this reviewer understands correctly. It sounds like intensities are being averaged across a whole cell population instead of cell-by-cell. Is that correct? If so, can the authors please provide the underlying rationale for this? If not, then please better describe what was actually done.

      We apologize for the confusion. Intensities are being averaged (summed) across a whole cell population, but importantly that step is only used to obtain a scale factor that converts the fluorescence signal at the microscope to the number of molecules. We then use that scale factor for all cells imaged in the bioreplicate, to both 1) find the total Myo10 in that cell, and 2) find the total amount of that Myo10 in any given location within that cell.

      To further clarify, each bioreplicate has a known total number of Myo10 molecules associated with the number of cells loaded onto the SDS gel. From the SDS gel, we have an average number of Myo10 molecules per positively transfected cell. If 50 cell images are analyzed, then there is a Myo10 ‘total pool’ of (50 cells) * (average Myo10 molecules/cell). The fluorescence signal intensities in microscopy were summed for all cells within the bioreplicate (50 cells in this example). However, due to variation in expression, not every cell has the same signal intensity when imaged under the same conditions. It would be inaccurate to assume each cell contains the average Myo10 molecules/cell. Therefore, to get the number of molecules within a given Myo10 cell (or punctum), the summed cell (punctum) intensity was divided by the bioreplicate fluorescence signal intensity sum and multiplied by ‘total pool.’

      - The authors quantify Myo10 protein amounts by western blotting using Halo tag fluorescence, a method that should provide good accuracy. The results depend on the transfection efficiency and it is rarely the case that it is 100%. The authors state that they use a ‘value correction for positively transfected cells’ (pg 11). It is likely that there was a range of expression levels in the cells, how was a cut-off for classifying a cell as non-expressing determined or set?

      As described in the Methods, “microscopy was used to count the percentage of transfected cells from ~105-190 randomly surveyed cells per bioreplicate.” Cells were labeled and located with DAPI. If no TMR signal could be visually detected by microscopy, then the cell was deemed to be non-Myo10 expressing. We did not set a cutoff fluorescence value, as untransfected cells have no detectable signal. Please see Supplementary Figure 1F for examples.

      - “In-house Python scripts” are used for image analysis. Will these be made publicly available?

      Yes, we will package these up on GitHub.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public Review): 

      This study is convincing because they performed time-resolved X-ray crystallography under different pH conditions using active/inactive metal ions and PpoI mutants, as with the activity measurements in solution in conventional enzymatic studies. Although the reaction mechanism is simple and may be a little predictable, the strength of this study is that they were able to validate that PpoI catalyzes DNA hydrolysis through "a single divalent cation" because time-resolved X-ray study often observes transient metal ions which are important for catalysis but are not predictable in previous studies with static structures such as enzyme-substrate analog-metal ion complexes. The discussion of this study is well supported by their data. This study visualized the catalytic process and mutational effects on catalysis, providing new insight into the catalytic mechanism of I-PpoI through a single divalent cation. The authors found that His98, a candidate of proton acceptor in the previous experiments, also affects the Mg2+ binding for catalysis without the direct interaction between His98 and the Mg2+ ion, suggesting that "Without a proper proton acceptor, the metal ion may be prone for dissociation without the reaction proceeding, and thus stable Mg2+ binding was not observed in crystallo without His98". In future, this interesting feature observed in I-PpoI should be investigated by biochemical, structural, and computational analyses using other metal-ion dependent nucleases. 

      We appreciate the reviewer for the positive assessment as well as all the comments and suggestions.

      Reviewer #2 (Public Review): 

      Summary: 

      Most polymerases and nucleases use two or three divalent metal ions in their catalytic functions. The family of His-Me nucleases, however, use only one divalent metal ion, along with a conserved histidine, to catalyze DNA hydrolysis. The mechanism has been studied previously but, according to the authors, it remained unclear. By use of a time resolved X-ray crystallography, this work convincingly demonstrated that only one M2+ ion is involved in the catalysis of the His-Me I-PpoI 19 nuclease, and proposed concerted functions of the metal and the histidine. 

      Strengths: 

      This work performs mechanistic studies, including the number and roles of metal ion, pH dependence, and activation mechanism, all by structural analyses, coupled with some kinetics and mutagenesis. Overall, it is a highly rigorous work. This approach was first developed in Science (2016) for a DNA polymerase, in which Yang Cao was the first author. It has subsequently been applied to just 5 to 10 enzymes by different labs, mainly to clarify two versus three metal ion mechanisms. The present study is the first one to demonstrate a single metal ion mechanism by this approach. 

      Furthermore, on the basis of the quantitative correlation between the fraction of metal ion binding and the formation of product, as well as the pH dependence, and the data from site-specific mutants, the authors concluded that the functions of Mg2+ and His are a concerted process. A detailed mechanism is proposed in Figure 6. 

      Even though there are no major surprises in the results and conclusions, the time-resolved structural approach and the overall quality of the results represent a significant step forward for the Me-His family of nucleases. In addition, since the mechanism is unique among different classes of nucleases and polymerases, the work should be of interest to readers in DNA enzymology, or even mechanistic enzymology in general. 

      Thank you very much for your comments and suggestions.

      Weaknesses: 

      Two relatively minor issues are raised here for consideration: 

      p. 4, last para, lines 1-2: "we next visualized the entire reaction process by soaking I-PpoI crystals in buffer....". This is a little over-stated. The structures being observed are not reaction intermediates. They are mixtures of substrates and products in the enzyme-bound state. The progress of the reaction is limited by the progress of the soaking of the metal ion. Crystallography has just been used as a tool to monitor the reaction (and provide structural information about the product). It would be more accurate to say that "we next monitored the reaction progress by soaking....". 

      We appreciate the clarification regarding the description of our experimental approach. We agree that our structures do not represent reaction intermediates but rather mixtures of substrate and product states within the enzyme-bound environment. We will revise the text accordingly to more accurately reflect our methodology.

      p. 5, the beginning of the section. The authors on one hand emphasized the quantitative correlation between Mg ion density and the product density. On the other hand, they raised the uncertainty in the quantitation of Mg2+ density versus Na+ density, thus they repeated the study with Mn2+ which has distinct anomalous signals. This is a very good approach. However, there is still no metal ion density shown in the key Figure 2A. It will be clearer to show the progress of metal ion density in a figure (in addition to just plots), whether it is Mg or Mn. 

      Thank you for your insightful comments. We recognize the importance of visualizing metal ion density alongside product density data. As you commented, distinguishing between Mg2+ and Na+ is challenging, and in Fig 2A, no distinguishable density was observed at 20s. Mn2+, with its higher electron density, is detectable even at low occupancy. To address this, we will include figure panels in Figure 3 or supplementary figures to present Mn2+ and product densities concurrently.

    1. Author response:

      a) that the investigation is very interesting and inventive, and has the potential to reveal some novel insights.

      We thank the reviewers and are excited to improve upon the manuscript through their suggestions.

      b) that the problem of temporal autocorrelation in the fMRI and behavioral data has not been dealt with clearly and convincingly

      We agree that convincingly accounting for fMRI temporal autocorrelation is important to our claims. To reduce its effects, we used field standard methods: prewhitening and autocorrelation modeling with SPM’s FAST algorithm (shown by Olszowy et al. 2019 to be superior to SPM’s default setting), as well as a high-pass filter of 128 Hz. There is still some first-order autocorrelation structure present across voxels in the left hippocampal beta series: across participants there is slightly positive autocorrelation between the betas of decision trials on successive trials, that decays to ~0 at subsequent lags. We note that our task is a narrative, and some patterns over time are expected; instead of attempting to fully eliminate all temporal structure in the data, we aim to show that the temporal distance between trials is unlikely to explain our effects.

      In the within versus between social dimension representational similarity analysis, the average temporal distance between trials is the same within and between dimensions. The clustering analysis is a between subject analysis about individual differences–and the same overall temporal structure is experienced by all participants.

      The trajectory analysis does not focus on consecutive trials across characters, but rather on consecutive trials within characters, where the time gap between successive trials is relatively large and highly variable. An average of over a minute of time elapses between successive decision trials for a given character (versus ~20 seconds across characters), which is on average almost 11 narrative slides and 3 decision trials. Across characters, the temporal gap between decision trials ranges between 12 seconds to more than 10 minutes, reducing the likelihood that temporal autocorrelation drives character-related estimates. We also highlight the shuffled choices control model, which shares the same temporal autocorrelation structure as the model of interest but had significantly poorer social location decoding–a strong indication that temporal autocorrelation alone can’t explain these results. For each participant, we shuffled their choices and re-computed trajectories that preserved the origin and end locations but produced different locations along the way. Our model decoded location significantly better than this null model, and this difference in performance can't be explained by differences in temporal autocorrelation in the neural or behavioral data.

      In the revision, we will further address this concern. For example, we will report more details on the task structure to aid in interpretation and will more precisely characterize the temporal autocorrelation profile. Where appropriate, we will also improve on and/or add more control analyses that preserve the autocorrelation structure.

      c) that a number of important interesting questions have not been addressed: Are the differences between social partners encoded in the hippocampus? Are the social dimensions encoded in a consistent manner across social partners?

      We believe that we should be able to decode other interesting task- and relationship-related features from the hippocampal patterns, as suggested by the reviewers. In the revision, we will attempt several such analyses, while taking care to control for temporal autocorrelation.

      d) that the cluster analysis in the brain-behavior correlation analysis is not well motivated or validated and should be clarified.

      We agree with the reviewers that this clustering analysis should be better described and validated. We aimed to ask whether less diverse and distinctive cognitive representations of the relationship trajectories relate to smaller real-world social networks. This question of impoverished cognitive maps was first raised by Edward Tolman; we think it is relevant here, as well. In the revision, we will clarify its motivations and implications, and better evaluate it for its robustness. Here, we address a few comments made by the reviewers.

      Reviewer 2 noted that other analyses could be used to ask whether social cognitive map complexity relates to real-world social network complexity. While the proposed alternatives are interesting (e.g., correlating decoding accuracy with social network size), we believe these analyses ask different questions. The current co-clustering analysis was intended to estimate map complexity jointly from the behavioral and neural signatures of the social map across characters. In contrast, the spline location decoding is within character; the accuracy of this decoding does not say much about representations across characters. And although we think character decoding is an interesting possible addition to this manuscript, its accuracy may reflect other aspects of the relationships, beyond just spatial representation. Thus, we will provide a clearer and better validated version of the current analysis to address this question.

      We would also like to clarify that we did not collect the Social Network Index questionnaire in the Initial sample; as such these results are more tentative than the other analyses, due to the inability to confirm them in a separate sample. Reviewer 2 also suggests that a single outlier could drive this effect; but estimating the effect with robust regression also returns a right-tailed p < 0.05, showing that the relationship is robust to outliers.

      References

      Olszowy, W., Aston, J., Rua, C. & Williams, W.B. Accurate autocorrelation modeling substantially improves fMRI reliability. Nature Communications. (2019).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 0: In this paper, the authors develop a comprehensive program to investigate the organization of chromosome structures at 100 kb resolution. It is extremely well executed. The authors have thought through all aspects of the problem. The resulting software will be most useful to the community. Interestingly they capture many experimental observations accurately.

      I have very few complaints.

      We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank them for the detailed suggestions and comments.

      Comment 1: The number of parameters in the energy function is very large. Is there any justification for this? Could they simplify the functions?

      We extend our gratitude to the reviewer for their insightful remarks. The parameters within our model can be categorized into two groups: those governing chromosome-chromosome interactions and those governing chromosome-nuclear landmark interactions.

      In terms of chromosome-chromosome interactions, the parameter count is relatively modest compared to the vast amount of Hi-C data available. For instance, while the whole-genome Hi-C matrix at the 100KB resolution encompasses approximately 303212 contacts, our model comprises merely six parameters for interactions among different compartments, along with 1000 parameters for the ideal potential. As outlined in the supporting information, the ideal potential is contingent upon sequence separation, with 1000 chosen to encompass bead separations of up to 100MB. While it is theoretically plausible to reduce the number of parameters by assuming interactions cease beyond a certain sequence separation, determining this scale a priori presents a challenge.

      During the parameterization process, we observed that interchromosomal contacts predicted solely based on compartmental interactions inadequately mirrored Hi-C data. Consequently, we introduced 231 additional parameters to more accurately capture interactions between distinct pairs of autosomes. These interactions may stem from factors such as non-coding RNA or proteins not explicable by simple, non-specific compartmental interactions.

      Regarding parameters concerning chromosome-nuclear landmark interactions, we have 30321 parameters for speckles and 30321 for the nuclear lamina. To streamline the model, we opted to assign a unique parameter to each chromatin bead. However, it is conceivable that many chromatin beads share a similar mechanism for interacting with nuclear lamina or speckles, potentially allowing for a common parameter assignment. Nonetheless, implementing such simplification necessitates a deeper mechanistic understanding of chromosome-nuclear landmark interactions, an aspect currently lacking.

      As our comprehension of nuclear organization progresses, the interpretability of parameter counts may improve, facilitating their reduction.

      Comment 2: What would the modification be if the resolution is increased?

      To increase the resolution of chromatin, we can in principle keep the same energy function as defined in Eq. S6. In this case, we only need to carry out further parameter optimization.

      However, transitioning to higher resolutions may unveil additional features not readily apparent at 100kb. Notably, chromatin loops with an average size of 200kb or smaller have been identified in high-resolution Hi-C data [1]. To effectively capture these loops, new terms in the energy function must be incorporated. For instance, Qi and Zhang [2] employed additional contact potentials between CTCF sites to account for loop formation. Alternatively, an explicit loop-extrusion process could be introduced to model loop formation more accurately.

      Comment 3: They should state that the extracted physical values are scale-dependent. For example, viscosity.

      We thank the reviewer for the comment and would like to clarify that our model does not predict the viscosity. The nucleoplasmic viscosity was set as 1Pa · s to produce a diffusion coefficient that reproduces experimental value. The exact value for the nucleoplasmic viscosity is still rather controversial, and our selected value falls in the range of reported experimental values from 10−1Pa·s to 102Pa · s.

      We have modified the main text to clarify the calculation of the diffusion coefficient.

      “The exponent and the diffusion coefficient Dα = (27±11)×10−4μm2 · s−α both match well with the experimental values [cite], upon setting the nucleoplasmic viscosity as 1Pa · s (see Supporting Information Section: Mapping the reduced time unit to real time for more details).”

      Reviewer 2:

      Comment 0: In this work, Lao et al. develop an open-source software (OpenNucleome) for GPU-accelerated molecular dynamics simulation of the human nucleus accounting for chromatin, nucleoli, nuclear speckles, etc. Using this, the authors investigate the steady-state organization and dynamics of many of the nuclear components.

      We thank the reviewer for summary of our work.

      Comment 1: The authors could introduce a table having every parameter and the optimal parameter value used. This would greatly help the reader.

      We would like to point out that model parameters are indeed provided in Table S1, S2, S3, S4, and Fig. S7. In these tables, we further provided details on how the parameters were determined.

      Given the large number of parameters for the ideal potential (1000), we opted to plot it rather than listing out all the numbers. We added three new figures to plot the interaction parameters between chromosomes, between chromosomes and speckles, and between chromosomes and the nuclear lamina. Numerical values can be found online in the GitHub repository (parameters).

      Comment 2: How many total beads are simulated? Do all beads have the same size?

      The total number of the coarse-grained beads is 70542, including 60642 chromatin beads, 300 nucleolus beads, 1600 speckle beads, and 8000 nuclear lamina beads. The radius of the chromatin, nucleolus, and speckle beads is 0.25, while that of the lamina bead is 0.5. More information of the size and number of the beads are discussed in the Section: Components of the whole nucleus model.

      Comment 3: In Equation S17, what is the 3rd and 4th powers mean? What necessitates it?

      The potential defined in Equation S17 follows the definition of class2 bond in the LAMMPS package (LAMMPS docs). Compared to a typical harmonic potential, the presence of higher order terms produces sharper increase in the energy at large distances (Author response image 1). This essentially reduces the flucatuation of bond length in simulations.

      Author response image 1.

      Comparison between the Class2 potential (defined in Eq. S17) and the Harmonic potential (K(r − r0)2, with K = 20 and r0 = 0.5).

      Comment 4: What do the X-axis and Y-axis numbers in Figure 5A and 5B mean? What are their units?

      We apologize for the lack of clarify in our original figure. In Fig. 5A, the X and Y axis depicts the simulated and experimental radius of gyration (Rg) for individual chromosomes, as indicated in the title of the figure. Similarly, in Fig. 5B, the X and Y axis depicts the simulated and experimental radial position of individual chromosomes.

      We have converted the chromosome Rg values into reduced units and labeled the corresponding axes in the updated figure (Fig. 5). The normalized radial position is unitless and its detailed definition is included in the supporting information Section: Computing simulated normalized chromosome radial positions. We updated the figure caption to provide an explicit reference to the SI text.

      Reviewer 3:

      Comment 0: In this work, the authors present the development of OpenNucleome, a software for simulating the structure and dynamics of the human nucleus. It provides a detailed model of nuclear components such as chromosomes and nuclear bodies, and uses GPU acceleration for better performance based on the OpenMM package. The work also shows the model’s accuracy in comparisons with experimental data and highlights the utility in the understanding of nuclear organization. While I consider this work a good tool for the genome architecture scientific community, I have some comments and questions that could further clarify the usage of this tool and help potential users. I also have a few questions that would help to clarify the technique and results and some suggestions for references.

      We appreciate the reviewer’s strong assessment of the paper’s significance, novelty, and broad interest, and we thank them for the detailed suggestions and comments.

      Comment 1: Could the authors elaborate on what they consider to be ’well-established and easily adoptable modeling tools’?

      By well established, we meant that models that have been extensively validated and verified, and are highly regarded by the community.

      By easily adoptable, we meant that tools that are well documented and can be relatively easily learned by new groups without help from the developers.

      We have revised the text to clarify our meaning.

      “Despite the progress made in computational modeling, the absence of well-documented software with easy-to-follow tutorials pose a challenge.”

      Comment 2: Recognizing the value of a diverse range of tools in the community, the Open-MiChroM tool is also an open-source platform built on top of OpenMM. The documentation shows various modeling approaches and many tutorials that contain different approaches besides the MiChroM energy function. How does OpenNucleome compare in terms of facilitating crossvalidation and user accessibility? The two tools seem to be complementary, which is a gain to the field. I recommend adding one or two sentences in the matter. Also, while navigating the OpenNucleome GitHub, I have not found the tutorials mentioned in the text. I also consider a barrier in the process of generating necessary input files. I would suggest expanding the tutorials and documentation to help potential users.

      We thank the reviewer for the excellent comments. We agree that while many of the tutorials were included in the original package, they were not as clearly documented. We have revised them extensively to to now present:

      • A tutorial for optimizing chromosome chromosome interactions.

      • A tutorial for optimizing chromosome nuclear landmark interactions.

      • A tutorial for building initial configurations.

      • A tutorial for relaxing the initial configurations.

      • A tutorial for selecting the initial configurations.

      • A tutorial for setting up performing Langevin dynamics simulations.

      • A tutorial for setting up performing Brownian dynamics simulations.

      • A tutorial for setting up performing simulations with deformed nucleus.

      • A tutorial for analyzing simulation trajectories.

      • A tutorial for introducing new features to the model.

      These tutorials and our well-documented and open source code (https://zhanggroup-mitchemistry.github.io/OpenNucleome) should significantly promote user accessibility. Our inclusion of python scripts for analyzing simulation trajectorials shall allow users to compute various quantities for evaluating and comparing model quality.

      We added a new paragraph in the Section: Conclusions and Dicussion of the main text to compare OpenNucleosome with existing software for genome modeling.

      “Our software enhances the capabilities of existing genome simulation tools [cite]. Specifically, OpenNucleome aligns with the design principles of Open-MiChroM [cite], prioritizing open-source accessibility while expanding simulation capabilities to the entire nucleus. Similar to software from the Alber lab [cite], OpenNucleome offers highresolution genome organization that faithfully reproduces a diverse range of experimental data. Furthermore, beyond static structures, OpenNucleome facilitates dynamic simulations with explicit representations of various nuclear condensates, akin to the model developed by [citet].”

      Comment 3: Lastly, I would appreciate it if the authors could expand their definition of ’standardized practices’.

      We apologize for any confusion caused. By ”standardized practices,” we refer to the fact that different groups often employ unique procedures for structural modeling. These procedures differ in the representation of chromosomes, the nucleus environment, and the algorithms for parameter optimization. This absence of a consensus on the optimal practices for genome modeling can be daunting for newcomers to the field.

      We have revised the text to the following to avoid confusion:

      “Many research groups develop their own independent software, which complicates crossvalidation and hinders the establishment of best practices for genome modeling [3–5].”

      Comment 4: On page 7, the authors refer to the SI Section: Components of the whole nucleus model for further details. Could the authors provide more information on the simulated density of nuclear bodies? Is there experimental data available that details the ratio of chromatin to other nuclear components, which was used as a reference in the simulation?

      We thank the reviewer for the comment. Imaging studies have provided quantitative measures about the size and number of various nuclear bodies. For example, there are 2 ∼ 5 nucleoli per nucleus, with the typical size RNo ≈ 0.5μm [6–10]. In the review by Spector and Lamond [11], the authors showed that there are 20 ∼ 50 speckles, with the typical size RSp ≈ 0.3μm. We used these numbers to guide our simulation of nuclear bodies. These information was mentioned in the Section: Chromosomes as beads on the string polymers of the supporting information.

      The chromatin density is fixed by the average size of chromatin bead and the nucleus size. We chose the size of chromatin based on imaging studies as detailed in the Subsection: Mapping chromatin bead size to real unit of the supporting information. Upon fixing the bead size, the chromatin volume is determined.

      Comment 5: In the statement, ’the ideal potential is only applied for beads from the same chromosome to approximate the effect of loop extrusion by Cohesin molecules for chromosome compaction and territory formation,’ it would be helpful if the authors could clarify the scope of this potential. Specifically, the code indicates that the variable ’dend ideal’ is set at 1000, suggesting an interaction along a 100Mb polymer chain at a resolution of 100Kb per bead. Could the authors elaborate on their motivation for the Cohesin complex’s activity having a significant effect over such long distances within the polymer chain?

      We thank the reviewer for the insight comment. They are correct that the ideal potential was introduced to capture chromosome folding beyond the interactions between compartments, including loop extrusion. Practically, we parameterized the ideal potential such that the simulated average contact probabilities as a function of sequence separation match the experimental values. The reviewer is correct that beyond a specific value of sequence separation, one would expect the impact of loop extrusion on chromosome folding should be negligible, due to Cohesin dissociation. Correspondingly, the interaction potential should be zero at large sequence separations.

      However, it is important to note that the precise separation scale cannot be known a priori. We chose 100Mb as a conservative estimation. However, as we can see from Fig. S7, our parameterization scheme indeed produced interaction parameters are mainly zero at large sequence separations. Interesting, the scale at which the potential approaches 0 (∼ 500KB), indeed agree with the estimated length traveled by Cohesin molecules before dissociation [12].

      Comment 6: On pages 8 and 9, the authors discuss the optimization process. However, in reviewing the code and documentation available on the GitHub page, I could not find specific sections related to the optimization procedure described in the paper. In this context, I have a few questions: Could the authors provide more details or direct me to the parts of the documentation and the text/SI that address the optimization procedure used in their study? Additional clarification on the cost/objective function employed during the optimization process would be highly beneficial, as this was not readily apparent in the text.

      We thank the reviewer for the comment. We revised the SI to include the definition of the cost function for the Adam optimizer.

      “During the optimization process, our aim was to minimize the disparity between experimental findings and simulated data. To achieve this, we defined the cost function as follows:

      where the index i iterates over all the constraints defined in Eq. S28.”

      The detailed optimization procedure was included in the SI as quoted below

      “The details of the algorithm for parameter optimization are as follows

      (1) Starting with a set of values for and we performed 50 independent 3-million-step long MD simulations to obtain an ensemble of nuclear configurations. The 500K steps of each trajectory are discarded

      as equilibration. We collected the configurations at every 2000 simulation steps from the rest of the simulation trajectories to compute the ensemble averages defined on the left-hand side of Eq. S13.

      (2) Check the convergence of the optimization by calculating the percentage of error

      defined as . The summation over i includes all the average contact probabilities defined in Eq. S28.

      (3) If the error is less than a tolerance value etol, the optimization has converged, and we stop the simulations. Otherwise, we update the parameters, α, using the Adam optimizer [13]. With the new parameter values, we return to step one and restart the iteration.”

      Previously, the optimization code was included as part of the analysis folder. To avoid confusion and improve readability, a separate folder named optimization has been created. This folder provides the Adam optimization of chromosome-chromosome interactions (chr-chr optimization) and chromosome-nuclear landmarks interactions (chr-NL optimization).

      Comment 7: What was the motivation for choosing the Adam algorithm for optimization? Adam is designed for training on stochastic objective functions. Could the authors elucidate on the ’stochastic’ aspect of their function to be optimized? Why the Adam algorithm was considered the most appropriate choice for this application?

      We thank the reviewer for the comment. As defined in Eq. R1, the cost function measures the difference between the simulated constraints with corresponding experimental values. The estimation of simulation values, by averaging over an ensemble of chromosome configurations, is inherently noisy and stochastic. Exact ensemble averages can only be achieved with unlimited samples obtained from infinite long simulations.

      In the past, we have used the Newton’s method for parameterization, and the detailed algorithm can be found in the SI of Ref. 14. However, we found that Adam is more efficient as it is a first-order approximation method. The Newton’s method, on the other hand, is second-order approximation method and requires estimation of the Hessian matrix. When the number of constraints is large, as is in our case, the computational cost for estimating the Hessian matrix can be significant. Another advantage of the Adam algorithm lies in its adjustment of the learning rate along the optimization to further speedup convergence.

      Comment 8: The authors mention that examples of setting up simulations, parameter optimization, and introducing new features are provided in the GitHub repository. However, I was unable to locate these examples. Could the authors guide me to these specific resources or consider adding them if they are not currently available?

      We thank the reviewer for the comment. We have improved the GitHub repository and all the tutorials can be found using the links provided in Response to Comment 2.

      Comment 9: Furthermore, the paper states that ’a configuration file that provides the position of individual particles in the PDB file format is needed to initialize the simulations.’ It would be beneficial for new users if the authors could elaborate on how this file is generated. And all other input files in general. Detailing the procedures for a new user to run their system using OpenNucleome would be helpful.

      We thank the reviewer for the comment. The procedure for generating initial configurations was explained in the SI Section: Initial configurations for simulations and quoted below.

      “We first created a total of 1000 configurations for the genome by sequentially generating the conformation of each one of the 46 chromosomes as follows. For a given chromosome, we start by placing the first bead at the center (origin) of the nucleus. The positions of the following beads, i, were determined from the (i − 1)-th bead as . v is a normalized random vector, and 0.5 was selected as the bond length between neighboring beads. To produce globular chromosome conformations, we rejected vectors, v, that led to bead positions with distance from the center larger than 4σ. Upon creating the conformation of a chromosome i, we shift its center of mass to a value ri com determined as follows. We first compute a mean radial distance, with the following equation

      where Di is the average value of Lamin B DamID profile for chromosome i. Dhi and Dlo represent the highest and lowest average DamID values of all chromosomes, and 6σ and 2σ represent the upper and lower bound in radial positions for chromosomes. As shown in Fig. S6, the average Lamin B DamID profiles are highly correlated with normalized chromosome radial positions as reported by DNA MERFISH [cite], supporting their use as a proxy for estimating normalized chromosome radial positions. We then select as a uniformly distributed random variable within the range . Without loss of generality, we randomly chose the directions for shifting all 46 chromosomes.

      We further relaxed the 1000 configurations to build more realistic genome structures. Following an energy minimization process, one-million-step molecular dynamics (MD) simulations were performed starting from each configuration. Simulations were performed with the following energy function

      where UGenome is defined as in Eq. S7. UG-La is the excluded volume potential between chromosomes and lamina, i.e, only the second term in Eq. S24. Parameters in UGenome were from a preliminary optimization. The end configurations of the MD simulations were collected to build the final configuration ensemble (FCE).”

      The tutorial for preparing initial configurations can be found at this link.

      Comment 10: In the section discussing the correlation between simulated and experimental contact maps, as referenced in Figure 4A and Figure S2, the authors mention a high degree of correlation. Could the authors specify the exact value of this correlation and explain the method used for its computation? Considering that comparing two Hi-C matrices involves a large number of data points, it would be helpful to know if all data points were included in this analysis.

      We have updated Fig 4A and S2 to include Pearson correlation coefficients next to the contact maps. The reviewer is correct in that all the non-redundant data points of the contact maps are included in computing the correlation coefficients.

      For improved clarity, we added a new section in the supporting information to detail the calculations. The section is titled Computing Pearson correlation coefficients between experimental and simulated contact maps, and the relevant text is quoted below.

      “We computed the Pearson correlation coefficients (PCC) between experimental and simulated contact maps in Fig. 4A and Fig. S2 as

      xi and yi represent the experimental and simulated contact probabilities, and n is the total number of data points. Only non-redundant data points, i.e., half of the pairwise contacts, are used in the PCC calculation.”

      Comment 11: In addition, the author said: ”Moreover, the simulated and experimental average contact probabilities between pairs of chromosomes agree well, and the Pearson correlation coefficient between the two datasets reaches 0.89.” How does this correlation behave when not accounting for polymer compaction or scaling? An analysis presenting the correlation as a function of genomic distance would be interesting.

      Author response image 2.

      Pearson correlation coefficient between experimental and simulated contact probabilities as a function of the sequence separation within specific chromosomes. For each chromosome, we first gathered a set of experimental contacts alongside a matching set of simulated ones for genomic pairs within a particular separation range. The Pearson correlation coefficient at the corresponding sequence separation was then determined using Equation R4. We limited the calculations to half of the chromosome length to ensure the availability of sufficient data.

      We thank the reviewer for the comment. The analysis presenting the correlation as a function of genomic distance (sequence separation) for each chromosome is shown in Figure S12 and also included in the SI. While the correlation coefficients decreases at larger separation, the values around 0.5 is quite reasonable and comparable to results obtained using Open-Michrom.

      We also computed the correlation of whole genome contact maps after excluding intra-chromosomal contacts. The PCC decreased from 0.89 to 0.4. Again, the correlation coefficient is quite reasonable considering that these contacts are purely predicted by the compartmental interactions and were not directly optimized.

      Comment 12: I recommend using the web-server that is familiar to the authors to benchmark the OpenNucleome tool/model: ”3DGenBench: A Web-Server to Benchmark Computational Models for 3D Genomics.” Nucleic Acids Research, vol. 50, no. W1, July 2022, pp. W4-12.

      We appreciate the reviewer’s suggestion. Unfortunately, the website is no longer active during the time of the revision. However, as detailed in Response to comment 11, we used the one of the popular metrics to exclude polymer compact effect and evaluate the agreement between simulation and experiments.

      Comment 13: Regarding the comparison of simulation results with microscopy data from reference 34. Given their different resolutions and data point/space groupings, how do the authors align these datasets? Could the authors describe how they performed this comparison? How were the radial positions calculated in both the simulations and experiments? Since the data from reference 34 indicates a non-globular shape of the nucleus; how did this factor into the calculation of radial distributions?

      We thank the reviewer for the comment and apologize for the confusion. First, the average properties we examined, including radial positions and interchromosomal contacts, were averaged over all genomic loci. Therefore, they are independent of data resolution.

      Secondly, instead of calculating the absolute radial positions, which are subject to variations in nucleus shape and size, we defined the normalized radial positions. They measure the ratio between the distance from the nucleus center to the chromosome center and the distance from the nucleus center to the lamina. This definition was frequently used in prior imaging studies to measure chromosome radial positions.

      The calculation of the simulated normalized radial positions and the experimental normalized radial positions are discussed in the Section: Computing simulated normalized chromosome radial positions

      “For a given chromosome i, we first determined its center of mass position denoted as Ci. Starting from the center of the nucleus, O, we extend the the vector vOC to identify the intersection point with the nuclear lamina as Pi. The normalized chromosome radial position i is then defined as , where ||·|| represents the L2 norm.

      and Section: Computing experimental normalized chromosome radial positions.

      “We followed the same procedure outlined in Section: Computing simulated normalized chromosome radial positions to compute the experimental values. To determine the center of the nucleus using DNA MERFISH data, we used the algorithm, minimum volume enclosing ellipsoid (MVEE)[15], to fit an ellipsoid for each genome structure. The optimal ellipsoid defined as is obtained by optimizing subjecting to the constraint that . xi correspond to the list of chromatin positions determined experimentally.”

      Comment 14: In the sentence: ”It is evident that telomeres exhibit anomalous subdiffusive motion.” I recommend mentioning the work ”Di Pierro, Michele, et al., ”Anomalous Diffusion, Spatial Coherence, and Viscoelasticity from the Energy Landscape of Human Chromosomes.” Proceedings of the National Academy of Sciences, vol. 115, no. 30, July 2018, pp. 7753-58.”.

      We have revised the sentence to include the citation as follows.

      “In line with previous research [cite], telomeres display anomalous subdiffusive motion. When fitted with the equation , these trajectories yield a spectrum of α values, with a peak around 0.59.”

      Comment 15: Regarding the observation that ’chromosomes appear arrested and no significant changes in their radial positions are observed over timescales comparable to the cell cycle,’ could the authors provide more details on the calculations or analyses that led to this conclusion? Specifically, information on the equilibration/relaxation time of chromosome territories relative to rearrangements within a cell cycle would be interesting.

      Our conclusion here was mostly based on the time trace of normalized radial positions shown in Figure 6A of the main text. Over the timescale of an entire cell cycle (24 hours), the relatively little to no changes in the radial positions supports glassy dynamics of chromosomes. We further determined the mean squared displacement (MSD) for chromosome center of masses. As shown in the left panel of Fig. S12, the MSDs are much smaller than the average size of chromosomes (see Rg values in Fig. 5A), supporting arrested dynamics.

      We further computed the auto-correlation function of the normalized chromosome radial position as

      where t indexes over the trajectory frames and ¯r is the mean position. As shown in Fig. S12, the positions are not completely decorrelated over 10 hours, again supporting slow dynamics. It would be interesting to examine the relaxation timescale more closely in future studies.

      Comment 16: The authors also comment on the SI ”Section: Initial configurations for simulations provides more details on preparing the 1000 initial configurations.” and related to reference 34 mentioning that ”the average Lamin B DamID profiles are highly correlated with chromosome radial positions as reported by DNA MERFISH”. How do the authors account for situations where homologous chromosomes are neighbors or have an interacting interface? Ref. 34 indicates that distinguishing between these scenarios can be challenging, potentially leading to ’invalid distributions’ that are filtered out. Clarification on how such cases were handled in the simulations would be helpful.

      We would like to first clarify that when comparing with experimental data, we averaged over the homologous chromosomes to obtain haploid data. We added the following text in the manuscript to emphasize this point

      “Given that the majority of experimental data were analyzed for the haploid genome, we adopted a similar approach by averaging over paternal and maternal chromosomes to facilitate direct comparison. More details on data analysis can be found in the Supporting Information Section: Details of simulation data analysis.”

      Furthermore, we used the processed DNA MERFISH data from the Zhuang lab, which unambiguously assigns a chromosome ID to each data point. Therefore, the issue mentioned by the reviewer is not present in the procssed data. In our simulations, since we keep track of the explicit connection between genomic segments, the trace of individual chromosomes can be determined for any configuration. Therefore, there is no ambiguity in terms of simulation data.

      Comment 17: When discussing the interaction with nuclear lamina and nuclear envelop deformation, I suggest mentioning the following studies: The already cited ref 52 and ”Contessoto, Vin´ıcius G., et al. ”Interphase Chromosomes of the Aedes Aegypti Mosquito Are Liquid Crystalline and Can Sense Mechanical Cues.” Nature Communications, vol. 14, no. 1, Jan. 2023, p. 326.”

      We updated the text to include the suggested reference.

      “Numerous studies have highlighted the remarkable influence of nuclear shape on the positioning of chromosomes and the regulation of gene expression [16, 17].”

      Comment 18: The authors state that ’Tutorials in the format of Python Scripts with extensive documentation are provided to facilitate the adoption of the model by the community.’ However, as I mentioned, the documentation appears to be limited, and the available tutorials could benefit from further expansion. I suggest that the authors consider enhancing these resources to better assist users in adopting and understanding the model.

      As detailed in the Response to Comment 2, we have updated the GitHub repository to better document the included Jupyter notebooks and tutorials.

      Comment 19: In the Methods section, the authors discuss using Langevin dynamics for certain simulations and Brownian dynamics for others. Could the authors provide more detailed reasoning behind the choice of these different dynamics for different aspects of the simulation? Furthermore, it would be insightful to know how the results might vary if only one of these dynamics was utilized throughout the study. Such clarification would help in understanding the implications of these methodological choices on the outcomes of the simulations.

      We thank the reviewer for the comment. As detailed in the supporting information Section: Mapping the Reduced Time Unit to Real Time, the Brownian dynamics simulations provide a rigorous mapping to the biological timescale. By choosing a specific value for the nucleoplasmic viscosity, we determined the time unit in simulations as τ = 0.65s. With this time conversion, the simulated diffusion coefficients of telomeres match well with experimental values. Therefore, Brownian dynamics simulations are recommended for computing time dependent quantities and the large damping coefficients mimics the complex nuclear environment well.

      On the other hand, the large damping coefficient slows down the configuration relaxation of the system significantly. For computing equilibrium statistical properties, it is useful to use a small coefficient and the Langevin integrator with large time steps to facilitate conformational relaxation.

      References

      [1] Rao, S. S.; Huntley, M. H.; Durand, N. C.; Stamenova, E. K.; Bochkov, I. D.; Robinson, J. T.; Sanborn, A. L.; Machol, I.; Omer, A. D.; Lander, E. S.; others A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 2014, 159, 1665–1680.

      [2] Qi, Y.; Zhang, B. Predicting three-dimensional genome organization with chromatin states. PLoS computational biology 2019, 15, e1007024.

      [3] Yildirim, A.; Hua, N.; Boninsegna, L.; Zhan, Y.; Polles, G.; Gong, K.; Hao, S.; Li, W.; Zhou, X. J.; Alber, F. Evaluating the role of the nuclear microenvironment in gene function by population-based modeling. Nature Structural & Molecular Biology 2023, 1–14.

      [4] Junior, A. B. O.; Contessoto, V. G.; Mello, M. F.; Onuchic, J. N. A scalable computational approach for simulating complexes of multiple chromosomes. Journal of molecular biology 2021, 433, 166700.

      [5] Fujishiro, S.; Sasai, M. Generation of dynamic three-dimensional genome structure through phase separation of chromatin. Proceedings of the National Academy of Sciences 2022, 119, e2109838119.

      [6] Caragine, C. M.; Haley, S. C.; Zidovska, A. Nucleolar dynamics and interactions with nucleoplasm in living cells. Elife 2019, 8, e47533.

      [7] Brangwynne, C. P.; Mitchison, T. J.; Hyman, A. A. Active liquid-like behavior of nucleoli determines their size and shape in Xenopus laevis oocytes. Proceedings of the National Academy of Sciences 2011, 108, 4334–4339.

      [8] Farley, K. I.; Surovtseva, Y.; Merkel, J.; Baserga, S. J. Determinants of mammalian nucleolar architecture. Chromosoma 2015, 124, 323–331.

      [9] Qi, Y.; Zhang, B. Chromatin network retards nucleoli coalescence. Nature Communications 2021, 12, 6824.

      [10] Caragine, C. M.; Haley, S. C.; Zidovska, A. Surface fluctuations and coalescence of nucleolar droplets in the human cell nucleus. Physical review letters 2018, 121, 148101.

      [11] Spector, D. L.; Lamond, A. I. Nuclear speckles. Cold Spring Harbor perspectives in biology 2011, 3, a000646.

      [12] Banigan, E. J.; Mirny, L. A. Loop extrusion: theory meets single-molecule experiments. Current opinion in cell biology 2020, 64, 124–138.

      [13] Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 2014,

      [14] Zhang, B.; Wolynes, P. G. Topology, structures, and energy landscapes of human chromosomes. Proceedings of the National Academy of Sciences 2015, 112, 6062–6067.

      [15] Moshtagh, N.; others Minimum volume enclosing ellipsoid. Convex optimization 2005, 111, 1–9.

      [16] Brahmachari, S.; Contessoto, V. G.; Di Pierro, M.; Onuchic, J. N. Shaping the genome via lengthwise compaction, phase separation, and lamina adhesion. Nucleic Acids Res. 2022, 50, 1–14.

      [17] Contessoto, V. G.; Dudchenko, O.; Aiden, E. L.; Wolynes, P. G.; Onuchic, J. N.; Di Pierro, M. Interphase chromosomes of the Aedes aegypti mosquito are liquid crystalline and can sense mechanical cues. Nature Communications 2023, 14, 326.

    1. Author response:

      eLife assessment:

      This important work provides another layer of regulatory mechanism for TGF-beta signaling activity. The evidence supports the involvement of microtubules as a reservoir of Smad2/3, however, additional evidence to convincingly demonstrate the functional involvement of Rudhira in this process is highly appreciated. The work will be of broad interest to developmental biologists in general and molecular biologists in the field of growth factor signaling.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      This manuscript aimed to study the role of Rudhira (also known as Breast Carcinoma Amplified Sequence 3), an endothelium-restricted microtubules-associated protein, in regulating of TGFβ signaling. The authors demonstrate that Rudhira is a critical signaling modulator for TGFβ signaling by releasing Smad2/3 from cytoskeletal microtubules and how Rudhira is a Smad2/3 target gene. Taken together, the authors provide a model of how Rudhira contributes to TGFβ signaling activity to stabilize the microtubules, which is essential for vascular development.

      Strengths

      The study used different methods and techniques to achieve aims and support conclusions, such as Gene Ontology analysis, functional analysis in culture, immunostaining analysis, and proximity ligation assay. This study provides an unappreciated additional layer of TGFβ signaling activity regulation after ligand-receptor interaction.

      We thank the reviewer for acknowledging the importance of our study and providing a clear summary of our findings.

      Weaknesses

      (1) It is unclear how current findings provide a better understanding of Rudhira KO mice, which the authors published some years ago.

      Our previous study demonstrated that Rudhira KO mice have a predominantly developmental cardiovascular phenotype that phenocopies TGFβ loss of function (Shetty, Joshi et al., 2018). Additionally, we found that at the molecular level, Rudhira regulates cytoskeletal organization (Jain et al., 2012; Joshi and Inamdar, 2019). Our current study builds upon these previous findings, showing an essential role of Rudhira in maintaining TGFβ signaling and controlling the microtubule cytoskeleton during vascular development. On one hand Rudhira regulates TGFβ signaling by promoting the release of Smads from microtubules, while on the other, Rudhira is a TGFβ target essential for stabilizing microtubules. Thus, our current study provides a molecular basis for Rudhira function in cardiovascular development.

      (2) Why do they use HEK cells instead of SVEC cells in Figure 2 and 4 experiments?

      Our earlier studies have characterized the role of Rudhira in detail using both loss and gain of function methods in multiple cell types (Jain et al., 2012; Shetty, Joshi et al., 2018; Joshi and Inamdar, 2019). As endothelial cells are particularly difficult to transfect, and because the function of Rudhira in promoting cell migration is conserved in HEK cells, it was practical and relevant to perform these experiments in HEK cells (Figures 2 and 4E).

      (3) A model shown in Figure 5E needs improvement to grasp their findings easily.

      We have modified Figure 5E for clarity.

      Reviewer #2 (Public Review):

      Summary

      It was first reported in 2000 that Smad2/3/4 are sequestered to microtubules in resting cells and TGF-β stimulation releases Smad2/3/4 from microtubules, allowing activation of the Smad signaling pathway. Although the finding was subsequently confirmed in a few papers, the underlying mechanism has not been explored. In the present study, the authors found that Rudhira/breast carcinoma amplified sequence 3 is involved in the release of Smad2/3 from microtubules in response to TGF-β stimulation. Rudhira is also induced by TGF-β and is probably involved in the stabilization of microtubules in the delayed phase after TGF-β stimulation. Therefore, Rudhira has two important functions downstream of TGF-β in the early as well as delayed phase.

      Strengths:

      This work aimed to address an unsolved question on one of the earliest events after TGF-β stimulation. Based on loss-of-function experiments, the authors identified a novel and potentially important player, Rudhira, in the signal transmission of TGF-β.

      We thank the reviewer for the critical evaluation and appreciation of our findings.

      Weaknesses:

      The authors have identified a key player that triggers Smad2/3 released from microtubules after TGF-β stimulation probably via its association with microtubules. This is an important first step for understanding the regulation of Smad signaling, but underlying mechanisms as well as upstream and downstream events largely remain to be elucidated.

      We acknowledge that the mechanisms regulating cytoskeletal control of Smad signaling are far from clear, but these are out of scope of this manuscript. This manuscript rather focuses on Rudhira/Bcas3 as a pivot to understand vascular TGFβ signaling and microtubule connections.

      (1) The process of how Rudhira causes the release of Smad proteins from microtubules remains unclear. The statement that "Rudhira-MT association is essential for the activation and release of Smad2/3 from MTs" (lines 33-34) is not directly supported by experimental data.

      We agree with the reviewer’s comment. Although we provide evidence that the loss of Rudhira (and thereby deduced loss of Rudhira-MT association) prevents release of Smad2/3 from MTs (Fig 3C), it does not confirm the requirement of Rudhira-MT association for this. In light of this, we have modified the statement to ‘Rudhira associates with MTs and is essential for the activation and release of Smad2/3 from MTs”.

      (2) The process of how Rudhira is mobilized to microtubules in response to TGF-β remains unclear.

      Our previous study showed that Rudhira associates with microtubules, and preferentially binds to stable microtubules (Jain et al., 2012; Joshi and Inamdar, 2019). Since TGFβ stimulation is known to stabilize microtubules, we hypothesize that TGFβ stimulation increases Rudhira binding to stable microtubules. We have mentioned this in our revised manuscript.

      (3) After Rudhira releases Smad proteins from microtubules, Rudhira stabilizes microtubules. The process of how cells return to a resting state and recover their responsiveness to TGF-β remains unclear.

      We show that dissociation of Smads from microtubules is an early response and stabilization of microtubules is a late TGFβ response. However, we agree that the sequence of these molecular events has not been characterized in-depth in this or any other study, making it difficult to assign causal roles (eg. whether release of Smads from MTs is a pre-requisite for MT stabilization by Rudhira) or reversibility. However, the TGFβ pathway is auto regulatory, leading to increased turnover of receptors and Smads and increased expression of inhibitory Smads, which may recover responsiveness to TGFβ. Additionally, the still short turnover time of stable microtubules (several minutes to hours) may also promote quick return to resting state.

      We have discussed this in our revised manuscript.

    1. Author response:

      eLife assessment

      This important study provides new insight into the dynamics that underlie the development of therapy resistance in prostate cancer by revealing that divergent tumor evolutionary paths occur in response to different treatment timing and that these converge on common resistance mechanisms. The use of barcoded lineage tracing and characterization of isolated tumor clonal populations provides compelling evidence supporting the importance of clonal dynamics in a tumor ecosystem for treatment resistance. Several open questions remain, however, raising the possibility of alternative interpretations of the data set in its current form. Overall, the findings deepen our understanding of prostate cancer evolution and hold promising implications for how drug resistance can be addressed or prevented.

      We are pleased the reviewers found our work reporting distinct evolutionary paths to resistance based on timing of treatment to be important and supported by compelling evidence.  We also acknowledge the need for additional work to clarify some details, particularly regarding the mechanism of clonal cooperativity as a catalyst of resistance.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Lee, Eugine et al. use in vivo barcoded lineage tracing to investigate the evolutionary paths to androgen receptor signaling inhibition (ARSI) resistance in two different prostate cancer clinical scenario models: measurable disease and minimal residual disease. Using two prostate cancer cell lines, LNCaP/AR and CWR22PC, the authors find that in their minimal residual disease models, the outgrowth of pre-existing resistant clones gives rise to ARSI-resistant tumors. Interestingly, in their measurable disease model or post-engraftment ARSI setting, these pre-existing resistant clones are depleted and rather a subset of clones that give rise to the treatment of naïve tumors adapt to ARSI treatment and are enriched in resistant tumors. For the LNCaP/AR cell line, characterization of pre-existing resistant clones in treatment naïve and ARSI treatment settings reveal increased baseline androgen receptor transcriptional output as well as baseline upregulation of glucocorticoid receptor (GR) as the primary driver of pre-existing resistance. Similarly, the authors found induction of high GR expression over long-term ARSI treatment in ARSI-sensitive clones for adaptive resistance to ARSI. For CWR22Pc cells, HER3/NRG1 signaling was the primary driver for ARSI resistance in both measurable disease and minimal residual disease models. Not only were these findings consistent with the authors' previous reports of GR and NRG1/Her3 as the molecular drivers of ARSI resistance in LNCaP/AR and CWR22Pc, respectively, but also demonstrate conserved resistance mechanisms despite pre-existing or adaptive evolutionary paths to resistance. Lastly, the authors show adaptive ARSI resistance is dependent on interclonal cooperation, where the presence of pre-existing resistant clones or "helper" clones is required to promote adaptive resistance in ARSI-sensitive clones.

      Strengths:

      The authors employ DNA barcoding, powerful a tool already demonstrated by others to track the clonal evolution of tumor populations during resistance development, to study the effects of the timing of therapy as a variable on resistance evolution. The authors use barcoding in two cell line models of prostate cancer in two clinical disease scenarios to demonstrate divergent evolutionary paths converging on common resistant mechanisms. By painstakingly isolating clones with barcodes of interest to generate clonal cell lines from the treatment of naïve cell populations, the authors are able to not only characterize pre-existing resistance but also show cooperativity between resistant and drug-sensitive populations for adaptive resistance.

      Weaknesses:

      While the finding that different evolutionary paths result in common molecular drivers of ARSI resistance is novel and unexpected, this work primarily confirms the authors' previous published work identifying the resistance mechanisms in these cell lines. The impact of the work would be greater with additional studies understanding the specific molecular/genetic mechanisms by which cells become resistant or cooperate within a population to give rise to resistant population subclones.

      We agree that additional insights into the mechanism of adaptive resistant and the role of cell-cell cooperativity are clear next steps for this work. We propose to do so through single cell characterization (RNA-seq, ATAC-seq) of tumor evolution in a time course experiment where we can track each clone using expressed barcodes. This will allow us to explore the dynamics of interaction between the "adaptable" and "helper" clones. Unfortunately, the barcode methodology used in this initial report is DNA-based; therefore, a follow-up study using a transcribable barcode library is needed to address these fascinating questions.

      This study would also benefit from additional explanation or exploration of why the two resistance driver pathways described (GR and NRG1/Her3) are cell line specific and if there are genetic or molecular backgrounds in which specific resistance signaling is more likely to be the predominant driver of resistance.

      In the case of NRG1/HER3 pathway mediated resistance, we know that this mechanism requires that the PTEN/PIK3CA pathway be wildtype.  This is the case for the CWR22Pc model described in the manuscript. Furthermore, we have data showing that PTEN deletion in these cells rescues the phenotype, meaning that CWR22Pc cells with PTEN deletion are no longer dependent on NRG1/HER3 signaling for ARSI resistance.

      In contrast, LNCaP/AR cells are PTEN null at baseline and therefore must evolve alternative mechanisms of ARSI resistance. Since our initial identification of the GR mechanism, we and others have extended the finding to additional models (VCaP, LAPC4) (PMID: 24315100; PMID: 28191869). Another recent insight is the importance of RB1 and TP53 status in maintenance of luminal lineage identity during ARSI therapy, and the recognition of lineage plasticity as a resistance mechanism in cell lines/tumor models that lack these two tumor suppressors. In summary, baseline genetics clearly plays a role in which ARSI resistance pathway is  likely to emerge. We will clarify this point in the revision with additional discussion.

      Reviewer #2 (Public Review):

      Summary

      The authors aimed to characterise the evolutionary dynamics that occur during the resistance to androgen receptor signalling inhibition, and how this differs in established tumours vs. residual disease, in prostate cancer. By using a barcoding method, they aimed to both characterise the distribution of clones that support therapy resistance in these settings, while also then being able to isolate said clones from the pre-graft population via single-cell cloning to characterise the mechanisms of resistance and dependency on cooperativity.

      While, interestingly, the timing of combination therapies has been shown to be critical to avoid cross-resistance, the timing of therapy has not been specifically considered as a factor dictating resistance pathways. Additionally, the role of residual disease and dormant populations in driving relapse is of increasing interest, yet a lot remains to be understood of these populations. The question of whether different clinical manifestations of therapy resistance follow similar evolutionary pathways to resistance is therefore interesting and relevant for the field.

      The methods applied are elegant and the body of work is substantial. The proposed divergent evolutionary pathways pose interesting questions, and the findings on cooperativity provide insight. However, whether the model truly reflects minimal residual disease to the extent that the authors suggest may limit the relevance of the findings at this stage. Certain patterns in the DNA barcoding results also call into question whether the results fully support the strong claims of the authors, or whether alternative explanations could exist. While the potential to isolate individual clones in the pre-graft setting is a great strength of the method applied and the isolation of these clones is a huge body of work in itself, the limited number of clones that could be isolated also somewhat limits the validation of the findings.

      Strengths

      Very relevant and interesting question, clear clinical relevance, applying elegant methods that hold the potential to provide a novel understanding of multiple aspects of therapy resistance, through from evolutionary patterns to intracellular and cooperative mechanisms of resistance.

      The text is clearly written, logical, and the structure is easy to follow.

      Weaknesses

      (1) The extent to which the model used truly mimics residual disease

      The main conclusions of the paper are built upon results using a model for minimal residual disease. However, the extent to which this truly recapitulates minimal residual disease, particularly with regard to their focus on the timings of therapy, could be discussed further. If in the clinical setting residual disease occurs following the existence of a tumour and its microenvironment, there might be many aspects of the process that are missed when coinciding treatment with engraftment of a xenograft tumour with pre-castration. If any characterisation of the minimal residual disease was possible (such as histologically or through RNA sequencing), this may help demonstrate in what ways this model recapitulates minimal residual disease.

      We appreciate the reviewer's feedback on this point and acknowledge that the pre-ARSI setting used in our studies is not precisely identical to minimal residual disease (MRD) seen clinically, where a patient typically undergoes primary treatment (radical prostatectomy surgery or local radiotherapy) then relapses with distant disease from micrometastases that were not initially detectable.  Having uncovered a key difference in the path to resistance using our pre-ARSI model, we believe our data provide a strong rationale to invest additional effort in designing newer MRD models that more closely mimic the clinical scenario, perhaps through surgical resection of a primary tumor that could “seed” micrometatases prior to therapy. We will highlight this aspect in our revised manuscript and provide clarity on the limitations and scope of our study.

      (2) Whether the observed enrichment of pre-resistant clones is truly that

      The authors strongly make the case that their barcoding experiments provide evidence for pre-existing resistance in the context of minimal residual disease. However, it seems that the clones enriched in the ARSIR tumours are consistently the most enriched clones in the pregraft. Is it possible that the high selective pressure in the pre-engraftment ARSI condition simply leads to an enrichment of the most populous clones from the pregraft? Whereas in the control setting, the reduced selective pressure at the point of engraftment allows for a wider variety of clones to establish in the tumour?

      The reviewer raises an important point about enrichment of ARSI resistance clones in the pregraft but we do not believe that explains the subsequent in vivo data for the following reasons:

      (1) The two most enriched clones in the Pre-ARSIR tumors are the second and third the most enriched clones in pre-graft, not first (Supplementary figure 1E). If the clones were enriched in resistant tumors based on their abundance in starting population, we expect to find the most enriched clone in the tumor.

      (2) By varying the androgen concentration in the pregraft culture media, we could selectively deplete or enrich the same clones enriched in the Pre-ARSIR tumors in vivo, indicating the enrichment of these clones in the resistant tumors is unlikely to be solely based on their relative frequency in the pregraft (Supplementary figure 2).

      We will clarify these points in the revised manuscript.

      Additionally, is there the possibility that the clones highly enriched in the pregraft are in fact a heterogeneous group of cells bearing the same barcode due to stochastic events in the process of viral transduction? Addressing these questions would greatly improve the study.

      The barcode library was deep sequenced to confirm even distribution of the barcodes before it was transferred from Novartis (PMID: 258491301) and we intentionally used a low multiplicity of infection (MOI) to generate barcode lines to ensure single copy insertion. That said, we cannot entirely rule out the possibility that the second and third most enriched clones in the pregraft originated from the same ancestral clone and subsequently acquired two different barcodes.  We will clarify this point in the revised manuscript.

      (3) The robustness of the subsequent work based on 1-2 pre-resistant clones

      While appreciating the volume of work involved in isolating and culturing individual pre-resistant clones, given the previous point, the conclusions would benefit from very robust validations with these single-cell clones. There are only two clones, and the results seem to focus more on one than the other, for which the data is less convincing. For instance, the Enz IC50 data, which in the case for pre-ARSI R2 is restricted to the supplementary, compares the clones A-D. In Figure S8 B, pre-ARSI R2 is compared to clone B, which is, of the four clones shown in the main figure when compared to R1, the one with the lowest Enz IC50. Therefore, while the resistant clones seem to have a significantly higher Enz IC50, comparing both clones to clones A-D may not have achieved this significance. It would also be useful to know how abundant the resistant clones were in the original barcode experiments.

      We acknowledge that studies relying on 1-2 biological samples indeed have limitations. Given our extensive prior work into the role of GR in the development of ARSI resistance (and that of other labs), we focused on demonstrating that both pre-ARSIR1 and pre-ARSIR2 clones exhibit pre-existing GR expression and are primed to further upregulate GR levels under ARSI conditions, thereby relying on GR function to sustain resistance. Given the redundancy of resistant mechanisms of the two clones, we made efforts to isolate additional clones enriched in Pre-ARSIR tumors. However, despite our attempts, we were unable to identify further clones. Pre-ARSIR1 and pre-ARSIR2 are second and third most enriched clones in pre-graft (2.1% and 1.7% respectively).

      (4) The logic used in the final section requires further explanation

      In the final section, the authors suggest that a pre-ARSIR clone is able to cooperate with a pre-Intact clone to aid adaptive ARSI resistance. If this is true, then could it not be that rare, pre-resistant clones support adaptive resistance in established tumours? And, therefore, the mechanism underlying resistance could be through pre-existing resistant clones in both settings. The work would benefit from a discussion to clarify this discrepancy in the interpretation of the findings. This is particularly necessary given the strong wording the authors use regarding their findings, such as that they have provided 'conclusive evidence' for acquired resistance.

      We agree that rare, pre-resistant clones could support adaptive resistance (and therefore resistance in this adaptive setting could, technically be called “pre-existing”) but it is critical to recognize that these rare, pre-resistant “helper” clones are vastly outnumbered by pre-Intact clones that “acquire” resistance through their “help.” We find this to be fascinating biology and we will clarify this logic in the resubmission, as well as future experimental approaches to unravel the mechanism.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Chowdhury and co-workers provide interesting data to support the role of G4-structures in promoting chromatin looping and long-range DNA interactions. The authors achieve this by artificially inserting a G4-containing sequence in an isolated region of the genome using CRISPR-Cas9 and comparing it to a control sequence that does not contain G4 structures. Based on the data provided, the authors can conclude that G4-insertion promotes long-range interactions (measured by Hi-C) and affects gene expression (measured by qPCR) as well as chromatin remodelling (measured by ChIP of specific histone markers).

      Whilst the data presented is promising and partially supports the authors' conclusion, this reviewer feels that some key controls are missing to fully support the narrative. Specifically, validation of actual G4-formation in chromatin by ChIP-qPCR (at least) is essential to support the association between G4-formation and looping. Moreover, this study is limited to a genomic location and an individual G4-sequence used, so the findings reported cannot yet be considered to reflect a general mechanism/effect of G4-formation in chromatin looping.

      Strengths:

      This is the first attempt to connect genomics datasets of G4s and HiC with gene expression. The use of Cas9 to artificially insert a G4 is also very elegant.

      Weaknesses:

      Lack of controls, especially to validate G4-formation after insertion with Cas9. The work is limited to a single G4-sequence and a single G4-site, which limits the generalisation of the findings.

      In the revised version we validated G4 formation inside cells at the insertion site using the reported G4-selective antibody BG4. Significant BG4 binding (by ChIP-qPCR) was clear in the G4-array insert, and not in the G4-mutated insert, supporting formation of G4s by the inserted G4-array (included as Figure S4).

      To directly address the second point, we inserted the G4-sequence, or the mutated control, at a second relatively isolated locus (at the 10 millionth position on Chr12, denoted as 10M site in text). First, BG4 ChIP was done to confirm intracellular G4 formation by the inserted array. BG4 ChIP-qPCR binding was significant within the inserted region, and not in the negative control region (Figure S8), consistent with the 79M locus. Together these demonstrate intracellular G4 formation by inserted sequences at two different loci.

      We next checked the state of chromatin of the G4-array inserted at the 10M locus, or its negative control. Histone marks H3K4Me1, H3K27Ac, H3K27Me3, H3K9me3 and H3K4Me3 were tested at the G4-array, or the negative control locus. Relative increase in the enhancer histone marks was evident, relative to the control sequence. This was largely similar to the 79M locus, supporting an enhancer-like state. Interestingly, here we further noted presence of the H3K27me3 histone mark. The presence of the H3K27Me3 repressor histone mark, along with H3K4Me1/H3K27Ac enhancer histone marks, support a poised enhancer-like status of the inserted G4 region, as has been observed earlier in other studies. Together, although data from the two distinct G4 insertion sites support the enhancer-like state, there are contextual differences likely due to the sequence/chromatin of the sites adjacent to the inserted sequence.

      Effect of the 10M G4-insertion on activation of surrounding genes (10 Mb window), and not the G4-mutant insert, was evident for most genes. Consistent with the enhancer-like state of the G4-array insert; in line with the 79M G4-array insert.

      These results have been added as the final section in the revised version, data is shown in Figure 7.

      Reviewer #2 (Public Review):

      Summary:

      Roy et al. investigated the role of non-canonical DNA structures called G-quadruplexes (G4s) in long-range chromatin interactions and gene regulation. Introducing a G4 array into chromatin significantly increased the number of long-range interactions, both within the same chromosome (cis) and between different chromosomes (trans). G4s functioned as enhancer elements, recruiting p300 and boosting gene expression even 5 megabases away. The study proposes a mechanism where G4s directly influence 3D chromatin organization, facilitating communication between regulatory elements and genes.

      Strength:

      The findings are valuable for understanding the role of G4-DNA in 3D genome organization and gene transcription.

      Weaknesses:

      The study would benefit from more robust and comprehensive data, which would add depth and clarity.

      (1) Lack of G4 Structure Confirmation: The absence of direct evidence for G4 formation within cells undermines the study's foundation. Relying solely on in vitro data and successful gene insertion is insufficient.

      Using the reported G4-specific antibody, BG4, we performed BG4 ChIP-qPCR at the 79M locus. In addition, a second G4-insertion site was created and BG4 ChIP-qPCR was used to validate intracellular G4 formation. Briefed below, more details in the response above.

      In the revised version we validated G4 formation inside cells at the insertion site using the reported G4-selective antibody BG4. Significant BG4 binding (by ChIP-qPCR) was clear in the G4-array insert, and not in the G4-mutated insert, supporting formation of G4s by the inserted G4-array (included as Figure S4).

      Further, we inserted the G4-sequence, or the mutated control, at a second relatively isolated locus (at the 10 millionth position on Chr12, denoted as 10M site in text). First, BG4 ChIP was done to confirm intracellular G4 formation by the inserted array. BG4-ChIP-qPCR was significant within the G4-array inserted region, and not in the negative control region (Figure S8), consistent with the 79M locus. Together these demonstrate intracellular G4 formation by inserted sequences at two different loci. Added in revised text in the second and the final sections of results, data shown in Figures 7, S4 and S8.

      (2) Alternative Explanations: The study does not sufficiently address alternative explanations for the observed results. The inserted sequences may not form G4s or other factors like G4-RNA hybrids may be involved.

      As mentioned in response to the previous comment, we confirmed that the inserted sequence indeed forms G4s inside the cells. RNA-DNA hybrid G4s can form within R-loops with two or more tandem G-tracks (G-rich sequences) on the nascent RNA transcript as well as the non-template DNA strand (Fay et al., 2017, 28554731). A recent study has observed that R-loop-associated G4 formation can enhance chromatin looping by strengthening CTCF binding (Wulfridge et al., 2023, 37552993). As pointed out by the reviewer, the possibility of G4-RNA hybrids remains, we have mentioned this possibility for readers in the second last paragraph of the Discussion.

      (3) Limited Data Depth and Clarity: ChIP-qPCR offers limited scope and considerable variation in some data makes conclusions difficult.

      We noted variation with one of the primers in a few ChIP-qPCR experiments (in Figures 2 and 3D). The changes however were statistically significant across replicates, and consistent with the overall trend of the experiments (Figures 2, 3 and 4). Enhancer function, in addition to ChIP, was also confirmed using complementary assays like 3C and RNA expression.

      (4) Statistical Significance and Interpretation: The study could be more careful in evaluating the statistical significance and magnitude of the effects to avoid overinterpreting the results.

      We reconfirmed our statistical calculations from biological replicate experiments. We carefully looked at potential overinterpretations, and made appropriate changes in the manuscript (details of the changes given below in response to comment to authors).

      Reviewer #3 (Public Review):

      Summary:

      This paper aims to demonstrate the role of G-quadruplex DNA structures in the establishment of chromosome loops. The authors introduced an array of G4s spanning 275 bp, naturally found within a very well-characterized promoter region of the hTERT promoter, in an ectopic region devoid of G-quadruplex and annotated gene. As a negative control, they used a mutant version of the same sequence in which G4 folding is impaired. Due to the complexity of the region, 3 G4s on the same strand and one on the opposite strand, 12 point mutations were made simultaneously (G to T and C to A). Analysis of the 3D genome organization shows that the WT array establishes more contact within the TAD and throughout the genome than the control array. Additionally, a slight enrichment of H3K4me1 and p300, both enhancer markers, was observed locally near the insertion site. The authors tested whether the expression of genes located either nearby or up to 5 Mb away was up-regulated based on this observation. They found that four genes were up-regulated from 1.5 to 3-fold. An increased interaction between the G4 array compared to the mutant was confirmed by the 3C assay. For in-depth analysis of the long-range changes, they also performed Hi-C experiments and showed a genome-wide increase in interactions of the WT array versus the mutated form.

      Strengths:

      The experiments were well-executed and the results indicate a statistical difference between the G4 array inserted cell line and the mutated modified cell line.

      Weaknesses:

      The control non-G4 sequence contains 12 point mutations, making it difficult to draw clear conclusions. These mutations not only alter the formation of G4, but also affect at least three Sp1 binding sites that have been shown to be essential for the function of the hTERT promoter, from which the sequence is derived. The strong intermingling of G4 and Sp1 binding sites makes it impossible to determine whether all the observations made are dependent on G4 or Sp1 binding. As a control, the authors used Locked Nucleic Acid probes to prevent the formation of G4. As for mutations, these probes also interfere with two Sp1 binding sites. Therefore, using this alternative method has the same drawback as point mutations. This major issue should be discussed in the paper. It is also possible that other unidentified transcription factor binding sites are affected in the presented point mutants.

      Since the sequence we used to test the effects of G4 structure formation is highly G-rich, we had to introduce at least 12 mutations to be sure that a stable G4 structure would not form in the mutated control sequence. Sp1 has been reported to bind to G4 structures (Raiber et al., 2012). Therefore, Sp1 binding is likely to be associated with the G4-dependent enhancer functions observed here. We also appreciate that apart from Sp1, other unidentified transcription factor binding sites might be affected by the mutations we introduced. We have discussed these possibilities in the fourth paragraph of the Discussion section in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      Whilst the data presented is promising and partially supports the authors' conclusion, this reviewer feels that some key controls are missing to fully support the narrative used. Below are my main concerns:

      (1) The main thing missing in the current manuscript is to validate the actual formation of G4 in chromatin context for the repeat inserted by CRISPR-Cas. Whilst I appreciate this will form promptly a G4 in vitro, to fully support the conclusions proposed the authors would need to demonstrate actual G4-formation in cells after insertion. This could be done by ChIP-qPCR using the G4-selective antibody BG4 for example. This is an essential piece of evidence to be added to link with confidence G4-formation to chromatin looping.

      To address the concern regarding whether the inserted G4 sequence forms G4s in cells, as suggested, we used the G4-selective antibody BG4. PCR primers in the study were designed keeping multiple points in mind: Primers should not bind to any site of G/C alteration in the mutated control insert; either the forward/reverse primer is from the adjacent region for specificity; covers adjacent regions for studying any effects on chromatin; and, PCRs optimized keeping in mind the repeats within the inserted sequence. Given these, primer pairs R1-R4 were chosen for further work following optimizations (Figure 2, top panel). For BG4 ChIP-qPCR we used primer pairs R2, which covered >100 bases of the inserted G4-array, or the G4-mutated control. Significant BG4 binding was clear in the G4-array insert, and not in the G4-mutated insert, demonstrating formation of G4s by the inserted G4-array (Figure S4).

      In response to comment #3 below, we inserted the G4-forming sequence (or its mutated control) at a second locus. This insertion was near the 10 millionth position of chromosome 12 (10M insertion locus in text). Here also, BG4 binding was significant within the G4-array inserted region, and not in the negative control region (Figure S8). Together these demonstrate G4 formation by the inserted sequence at two different loci.

      (2) I found the LNA experiment very elegant. However, what would be the effect of LNA treatment on the control sequence that does not form G4s? This control is essential to disentangle the effect of LNA pairing to the sequence itself vs disrupting the G4-structure.

      As per the reviewer’s suggestion, we performed a control experiment where we treated the G4-mutated insert (control) cells with the G4-disrupting LNA probes. The changes in the expression of the surrounding genes in this case were not significant, indicating that the effects observed in the G4-array insert cells were possibly due to disruption of the inserted G4 structures. This data is presented in Figure S5.

      (3) The authors describe their work and present its conclusion as if this were a genome-wide study, whilst the work is focused on a specific genomic location, and the looping, along with the effect on histone acetylation and gene expression, is limited to this. The authors cannot conclude, therefore, that this is a generic effect and the discussion should be more focused on the specific G4s used and the genomic location investigated. Ideally, insertion of a different G4-forming sequence or of the same in a different genomic location is recommended to really claim a generic effect.

      To address this we inserted the G4-array sequence, or the G4-mutated control sequence, at another relatively isolated locus – at the 10 millionth position of chromosome 12 – denoted as 10M. Using BG4 ChIP-qPCR intracellular G4 formation was confirmed. We observed that the enhancer-like features in terms of enhancer histone marks and increase in the expression of surrounding genes were largely reproduced at the 10M locus on G4 insertion (Figure 7). These results are added as the final section under Results.

      Reviewer #2 (Recommendations For The Authors):

      The study proposes a mechanism where G4s directly influence 3D chromatin organization, facilitating communication between regulatory elements and genes.

      While the present manuscript presents an interesting hypothesis, it would benefit from enhanced novelty and more robust data. The study complements existing G4 research (e.g., PMID: 31177910). While the conclusions hold biological relevance, they largely reiterate established knowledge. Furthermore, the presented data appear preliminary and still lack depth and clarity.

      Hou et al., 2019 (PMID: 31177910) showed presence of potential G4-forming sequences correlated with TAD boundaries, along with enrichment of architectural proteins and transcription factor binding sites. Also, other studies noted enrichment of potential G4-forming sequences at enhancers along with nucleosome depletion and higher transcription factor binding (Hou et al., 2021; Williams et al., 2020). These studies proposed the role of G4s in chromatin/TAD states based on analysis of potential G4-forming sequences using correlative bioinformatics analyses. Here we sought to directly test causality. Insertion of G4 sequence, and formation of intracellular G4s in an isolated, G4-depleted region resulted in altered characteristics of chromatin, and not in the negative control insertion that does not form G4s. These, in contrast to earlier studies, directly demonstrates the causal role of G4s as functional elements that impact local and distant chromatin.

      Major concerns:

      (1) Lack of G4 Structure Confirmation: Implement G4-specific antibodies or fluorescent probes to verify G4 structures inside the cells.

      Detailed response given above. Briefly, in the revised version we validated G4 formation inside cells at the insertion site using the reported G4-selective antibody BG4. Significant BG4 binding (by ChIP-qPCR) was clear in the G4-array insert, and not in the G4-mutated insert, supporting formation of G4s by the inserted G4-array (included as Figure S4).

      Further, we inserted the G4-sequence, or the mutated control, at a second relatively isolated locus (at the 10 millionth position on Chr12, denoted as 10M site in text). First, BG4 ChIP was done to confirm intracellular G4 formation by the inserted array. BG4 ChIP-qPCR binding was significant within the G4-array inserted region, and not in the negative control region (Figure S8), consistent with the 79M locus. Together these demonstrate intracellular G4 formation by inserted sequences at two different loci. Added in revised text in the second and the final sections of results, data shown in Figures 7, S4 and S8.

      (2) Alternative Explanations: Explore the possibility that the sequences may not form G4s or that other factors like G4-RNA hybrids are involved.

      Response provided in the public reviews section.

      (3) Limited Data Depth and Clarity: ChIP-qPCR offers limited scope. Consider employing G4 ChIP-seq for genome-wide analysis of G4 association with histone modifications. Address inconsistencies in data like H3K27me3 variation and incomplete H3K9me3 data sets.

      A recent study performed G4 CUT&Tag (Lyu et al., 2022, 34792172) and observed G4 formation at both active promoters and active and poised enhancers. We have discussed this in the sixth paragraph of the Discussion. The H3K27Me3 occupancy at the 79M locus insertions did not have any significant G4-dependent changes, however, at the second insertion site at the 10M locus (introduced in the revised manuscript, Figure 7) there was significant G4-dependent increase in H3K27Me3 occupancy along with the H3K4Me1 and H3K27Ac enhancer histone marks, indicating formation of a poised enhancer-like element.

      We completed the H3K9me3 data sets for both insertion sites.

      (4) Statistical Significance and Interpretation: Re-evaluate the statistical significance of results and interpret them in the context of relevant biological knowledge. Avoid overstating the impact of minor changes.     

      We revised several lines to avoid overstating results. Some of the changes are as below (changes underline/strikethrough)

      - There was an a relatively modest increase in the recruitment of both p300 and a substantial increase in the recruitment of the more functionally active acetylated p300/CBP to the G4-array when compared against the mutated control.

      - As expected, although modest, a decrease in the H3K4Me1 and H3K27Ac enhancer histone modifications was evident within the insert upon the LNAs treatment.

      - Moreover, the enhancer marks were relatively reduced, although not markedly, when the inserted G4s were specifically disrupted.

      (5) Unexplored Aspects: Investigate the relationship between G4 DNA and R-loops, and consider the role of CTCF and cohesin proteins in mediating long-range interactions. Integrate existing research to build a more comprehensive framework and draw more robust conclusions.

      As mentioned in response to one of the earlier comments, a recent publication extensively studied the association between G4s, R-loops, and CTCF binding (Wulfridge et al., 2023). While, here we focused on the primary features of a potential enhancer, further work will be necessary to establish how G4s influence the coordinated action between cohesin and CTCF and consequent chromatin looping. We have described this for readers in the second last paragraph of the Discussion in the revised version.

      Minor Concern:

      (1) Enhancer Definition: The term "enhancer" requires specific criteria. Modify the section heading or provide evidence demonstrating the G4 sequence fulfills all conditions for being an enhancer, such as position independence and long-range effects.

      Although we checked some of the primary features of a potential enhancer: Like expression of surrounding genes, enhancer histone marks, chromosomal looping interactions, and recruitment of transcriptional coactivators, further aspects may need to be validated. As suggested, in the revised manuscript the section heading has been modified to ‘Enhancer-like features emerged upon insertion of G4s.’

      Reviewer #3 (Recommendations For The Authors):

      In addition to the points in my public review, I would like to mention some less significant points.

      The authors mention that "the array of G4-forming sequences used for insertion was previously reported to form stable G4s in human cells" (Lim et al., 2010; Monsen et al., 2020; Palumbo et al., 2009). However, upon reading the publications, I found that these observations were made in vitro. I may have missed something, but there are now several mappings of folded-G4 in human cells based on different approaches. It would be beneficial to investigate whether the hTERT promoter is a site of G-quadruplex formation in vivo. If confirmed, a similar analysis should be conducted on the 275 bp region inserted into the ectopic region to determine if it also has the ability to form a structured G4.

      We performed BG4 ChIP to confirm in vivo G4 formation by the inserted G4-array as suggested (Figures S4, S8). Detail response given above. Briefly, in the revised version we validated G4 formation inside cells at the insertion site using the reported G4-selective antibody BG4. Significant BG4 binding (by ChIP-qPCR) was clear in the G4-array insert, and not in the G4-mutated insert, supporting formation of G4s by the inserted G4-array (included as Figure S4).

      Further, we inserted the G4-sequence, or the mutated control, at a second relatively isolated locus (at the 10 millionth position on Chr12, denoted as 10M site in text). First, BG4 ChIP was done to confirm intracellular G4 formation by the inserted array. BG4-ChIP-qPCR was significant within the inserted region, and not in the negative control region (Figure S8). Consistent with the 79M locus. Together these demonstrate intracellular G4 formation by inserted sequences at two different loci. Added in revised text in the second and the final sections of results, data shown in Figures 7, S4 and S8.

      The inserted sequence originates from a well-characterized promoter. The authors suggest that placing it in an ectopic position creates an enhancer-like region, based on the observation of increased levels of H3K27Ac and H3K4me1 on the WT array. To provide a control that it is not a promoter, it would be useful to also analyze a specific mark of promoter activity, such as H3K4me3.

      As suggested by reviewer, we also analysed the H3K4Me3 promoter activation mark at both the 79M and 10M (introduced in the revised manuscript, Figure 7) insertion loci. We did not observe any significant G4-dependent changes in the recruitment of H3K4Me3 (Figures 2, 7).

      In the discussion, the authors mention "it was proposed that inter-molecular G4 formation between distant stretches of Gs may lead to DNA looping". To investigate this further, it would be worthwhile to examine whether the promoter regions of activated genes (PAWR, PPP1R12A, NAV3, and SLC6A15) contain potentially forming G-quadruplexes (pG4). Additionally, sites that establish more contact with the G4 array described in Figure 6F could be analyzed for enrichment in pG4.

      Thank you for pointing this out. We found promoters of the four genes (PAWR, PPP1R12A, NAV3, and SLC6A15) harbour potential G4-forming sequences (pG4s). Also as suggested, we analysed the contact regions in Fig 6F, along with the whole locus, for pG4s. Relative enrichment in pG4 was seen, particularly within the significantly enhanced interacting regions, which at times spreads beyond the interacting regions also. This is shown in the lower panel of Figure 6F in the revised version. We have described this in Discussion for readers.

    1. Author response:

      eLife assessment

      This important study addresses the idea that defective lysosomal clearance might be causal to renal dysfunction in cystinosis. They observe that restoring expression of vATPase subunits and treatment with Astaxanthin ameliorate mitochondrial function in a model of renal epithelial cells, opening opportunities for translational application to humans. The data are convincing, but the description of methodologies is incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Cystinosis is a rare hereditary disease caused by biallelic loss of the CTNS gene, encoding two cystinosin protein isoforms; the main isoform is expressed in lysosomal membranes where it mediates cystine efflux whereas the minor isoform is expressed at the plasma membrane and in other subcellular organelles. Sur et al proceed from the assumption that the pathways driving the cystinosis phenotype in the kidney might be identified by comparing the transcriptome profiles of normal vs CTNS-mutant proximal tubular cell lines. They argue that key transcriptional disturbances in mutant kidney cells might not be present in non-renal cells such as CTNS-mutant fibroblasts.

      Using cluster analysis of the transcriptomes, the authors selected a single vacuolar H+ATPase (ATP6VOA1) for further study, asserting that it was the "most significantly downregulated" vacuolar H+ATPase (about 58% of control) among a group of similarly downregulated H+ATPases. They then showed that exogenous ATP6VOA1 improved CTNS(-/-) RPTEC mitochondrial respiratory chain function and decreased autophagosome LC3-II accumulation, characteristic of cystinosis. The authors then treated mutant RPTECs with 3 "antioxidant" drugs, cysteamine, vitamin E, and astaxanthin (ATX). ATX (but not the other two antioxidant drugs) appeared to improve ATP6VOA1 expression, LC3-II accumulation, and mitochondrial membrane potential. Respiratory chain function was not studied. RTPC cystine accumulation was not studied.

      In this manuscript, as an initial step, we have studied the first step in respiratory chain function by performing the Seahorse Mito Stress Test to demonstrate that the genetic manipulation (knocking out the CTNS gene and plasmid-mediated expression correction of ATP6V0A1) impacts mitochondrial energetics. We did not investigate the respirometry-based assays that can identify locations of electron transport deficiency, which we plan to address in a follow-up paper.

      We would like to draw attention to Figure 3D, where cystine accumulation has been studied. This figure demonstrates an increased intracellular accumulation of cystine.

      The major strengths of this manuscript reside in its two primary findings.

      (1) Plasmid expression of exogenous ATP6VOA1 improves mitochondrial integrity and reduces aberrant autophagosome accumulation.

      (2) Astaxanthin partially restores suboptimal endogenous ATP6VOA1 expression.

      Taken together, these observations suggest that astaxanthin might constitute a novel therapeutic strategy to ameliorate defective mitochondrial function and lysosomal clearance of autophagosomes in the cystinotic kidney. This might act synergistically with the current therapy (oral cysteamine) which facilitates defective cystine efflux from the lysosome.

      There are, however, several weaknesses in the manuscript.

      (1) The reductive approach that led from transcriptional profiling to focus on ATP6VOA1 is not transparent and weakens the argument that potential therapies should focus on correction of this one molecule vs the other H+ ATPase transcripts that were equally reduced - or transcripts among the 1925 belonging to at least 11 pathways disturbed in mutant RPTECs.

      The transcriptional profiling studies on ATP6V0A1 have been fully discussed and publicly shared. Table 2 lists the v-ATPase transcripts that are significantly downregulated in cystinosis RPTECs. We have also clarified and justified the choice of further studies on ATP6V0A1, where we state the following: "The most significantly perturbed member of the V-ATPase gene family found to be downregulated in cystinosis RPTECs is ATP6V0A1 (Table 2). Therefore, further attention was focused on characterizing the role of this particular gene in a human in vitro model of cystinosis."

      (2) A precise description of primary results is missing -- the Results section is preceded by or mixed with extensive speculation. This makes it difficult to dissect valid conclusions from those derived from less informative experiments (eg data on CDME loading, data on whole-cell pH instead of lysosomal pH, etc).

      We appreciate the reviewer highlighting areas for further improving the manuscript's readership. In our resubmission, we have revised the results section to provide a more precise description of the primary findings and restrict the inferences to the discussion section only.

      (3) Data on experimental approaches that turned out to be uninformative (eg CDME loading, or data on whole=cell pH assessment with BCECF).

      We have provided data whether it was informative or uninformative. Though lysosome-specific pH measurement would be important to measure, it was not possible to do it in our cells as they were very sick and the assay did not work. Hence we provide data on pH assessment with BCECF, which measures overall cytoplasmic and organelle pH, which is also informative for whole cell pH that is an overall pH of organelle pH and cytoplasmic pH.

      (4) The rationale for the study of ATX is unclear and the mechanism by which it improves mitochondrial integrity and autophagosome accumulation is not explored (but does not appear to depend on its anti-oxidant properties).

      We have provided rationale for the study of ATX; provided in the introduction and result section, where we mentioned the following: “correction of ATP6V0A1 in CTNS-/- RPTECs and treatment with antioxidants specifically, astaxanthin (ATX) increased the production of cellular ATP6V0A1, identified from a custom FDA-drug database generated by our group, partially rescued the nephropathic RPTEC phenotype. ATX is a xanthophyll carotenoid occurring in a wide variety of organisms. ATX is reported to have the highest known antioxidant activity and has proven to have various anti-inflammatory, anti-tumoral, immunomodulatory, anti-cancer, and cytoprotective activities both in vivo and in vitro”.

      We are still investigating the mechanism by which ATX improves mitochondrial integrity and this will be the focus of a follow-on manuscript.

      (5) Thoughtful discussion on the lack of effect of ATP6VOA1 correction on cystine efflux from the lysosome is warranted, since this is presumably sensitive to intralysosomal pH.

      We have provided a thoughtful discussion in the revised manuscript on some possible mechanisms that may result in an effect of ATP6V0A1 correction on cysteine efflux from the lysosome.

      (6) Comparisons between RPTECs and fibroblasts cannot take into account the effects of immortalization on cell phenotype (not performed in fibroblasts).

      The purpose of examining different tissue sources of primary cells in nephropathic cystinosis was to assess if any of the changes in these cells were tissue source specific. We used primary cells isolated from patients with nephropathic cystinosis—RPTECs from patients' urine and fibroblasts from patients' skin—these cells are not immortalized and can therefore be compared. This is noted in the results section - “Specific transcriptional signatures are observed in cystinotic skin-fibroblasts and RPTECs obtained from the same individual with cystinosis versus their healthy counterparts”.

      We next utilized the immortalized RPTEC cell line to create CRISPR-mediated CTNS knockout RPTECs as a resource for studying the pathophysiology of cystinosis. These cells were not compared to the primary fibroblasts.

      (7) This work will be of interest to the research community but is self-described as a pilot study. It remains to be clarified whether transient transfection of RPTECs with other H+ATPases could achieve results comparable to ATP6VOA1. Some insight into the mechanism by which ATX exerts its effects on RPTECs is needed to understand its potential for the treatment of cystinosis.

      In future studies we will further investigate the effect of ATX on RPTECs for treatment of cystinosis- this will require the conduct of Phase 1 and Phase 2 clinical studies which are beyond the scope of this current manuscript.

      Reviewer #2 (Public Review):

      Sur and colleagues investigate the role of ATP6V0A1 in mitochondrial function in cystinotic proximal tubule cells. They propose that loss of cystinosin downregulates ATP6V0A1 resulting in acidic lysosomal pH loss, and adversely modulates mitochondrial function and lifespan in cystinotic RPTECs. They further investigate the use of a novel therapeutic Astaxanthin (ATX) to upregulate ATP6V0A1 that may improve mitochondrial function in cystinotic proximal tubules.

      The new information regarding the specific proximal tubular injuries in cystinosis identifies potential molecular targets for treatment. As such, the authors are advancing the field in an experimental model for potential translational application to humans.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The authors fail to truly define codon optimality, rare codons, and stalling sequences in their work, all of which are distinct terminologies. They use reporters with rare codon usage but do not mention what metrics they use to determine this, such as cAI, codon usage bias, or tAI. The distinction between the type of codon sequences that DDX6 affects is very important to differentiate and should be done here as certain stretches of codons are known to lead to different quality control RNA decay pathways that are not reliant on canonical mRNA decay factors.

      Thank you for the reviewer’s feedback on our work. Clearly defining codon optimality, rare codons, and stalling sequences is indeed crucial. We will emphasize this distinction more in our revisions to help readers better understand our analysis and findings.

      Likewise, the authors sort their Ribo-seq data to determine genes that might exhibit a DDX6 specific mRNA decay effect but fail to go into great depth about common features shared among these genes other than GO term analysis, GC content, and coding sequence (CDS) length. The authors then sort out 35 genes that are both upregulated at the mRNA level and have increased local ribosome footprint along the ORF. They are then able to show that 6 out of 9 of those genes had a DDX6-dependent mRNA decay effect. There was no comment or effort as to why 2 out of those 6 genes tested did not show as strong of a DDX6-dependent decay effect relative to the other targets tested. Thus, the efforts to identify mRNA features at a global level that exhibited DDX6-dependent mRNA decay effects are lacking in this analysis.

      We appreciate the reviewer's insightful comments regarding the need to further characterize the genes influenced by DDX6-mediated mRNA decay. To address this, we carried out additional analyses to identify potential traits of these genes. Our findings revealed that DDX6-regulated coding sequences tend to be longer and exhibit lower predicted mRNA stability scores compared to the average across the transcriptome. This observation indicates a possible connection to codon optimality. It suggests that DDX6 could play a role in regulating a specific subset of mRNAs with inherently lower stability, potentially shedding light on why some genes may exhibit varied decay patterns when DDX6 is depleted.

      Overall, the work done by Weber et al. is sound, with the proper controls. The authors expand significantly on the knowledge of what we know about DDX6 in the process of mRNA decay in humans, confirming the evolutionary conservation of the role of this factor across eukaryotes. The analysis of the RNA-seq and Ribo-seq data could be more in-depth, however, the authors were able to show with certainty that some transcripts containing known repetitive sequences or polybasic sequences exhibited a DDX6-mRNA decay effect.

      We appreciate the reviewer’s acknowledgment of the soundness of our work and the inclusion of proper controls. We are committed to refining our manuscript to meet your expectations and ensure the accuracy and depth of our findings.

      Reviewer #2 (Public Review):

      The experiments were well-performed, and the results clearly demonstrated the requirement of DDX6 in mRNA degradation induced by slowed ribosomes. However, in some cases, the authors interpreted their data in a biased way, possibly influenced by the yeast study, and drew too strong conclusions. In addition, the authors should have cited important studies about codon optimality in mammalian cells. This lack of information hinders placing their important discoveries in a correct context.

      (1) Although the authors concluded that DDX6 acts as a sensor of the slowed ribosome, it is not clear if DDX6 indeed senses the ribosome speed. What the authors showed is a requirement of DDX6 for mRNA decay induced by rare codons, and DDX6 binds to the ribosome to exert this role. For example, DDX6 may bridge the sensor and decay machinery on the ribosome. Without structural or biochemical data on the recognition of the slowed ribosome by DDX6, the role of DDX6 as a sensor remains one of the possible models. It should be described in the discussion section.

      We greatly appreciate the reviewer’s comments and suggestions. We agree that our study does not directly establish that DDX6 senses ribosome speed. We also agree that without structural or biochemical data demonstrating recognition of the slowed ribosome by DDX6, the role of DDX6 as a sensor remains one of the possible models. We will incorporate this point into the discussion section and acknowledge it as an important direction for future research.

      (2) It is not clear if DDX6 directly binds the ribosome. The authors used ribosomes purified by sucrose cushion, but ribosome-associating and FDF motif-interacting factors might remain on ribosomes, even after RNaseI treatment. Without structural or biochemical data of the direct interaction between the ribosome and DDX6, the authors should avoid description as if DDX6 directly binds to the ribosome.

      We agree with the reviewer’s perspective that, even after RNase I treatment, factors associated with the ribosome and interacting with the FDF motif might still remain on the ribosomes that were purified via a sucrose cushion. In the revised manuscript, we will describe the relationship between DDX6 and the ribosome more cautiously, avoiding the depiction of DDX6 directly binding to the ribosome.

      (3) Although the authors performed rigorous reporter assays recapitulating the effect of ribosome-retardation sequences on mRNA stability, this is not the first report showing that codon optimality determines mRNA stability in human cells. The authors did not cite important previous studies, such as Wu et al., 2019 (PMID: 31012849), Hia et al., 2019 (PMID: 31482640), Narula et al., 2019 (PMID: 31527111), and Forrest et al., 2020 (PMID: 32053646). These milestone papers should be cited in the Introduction, Results, and Discussion.

      Thank you for the reviewer’s correction. We apologize for the oversight in our references. In the revised manuscript, we will ensure these key studies are appropriately cited.

      (4) While both DDX6 and deadenylation by the CCR4-NOT were required for mRNA decay by the slowed ribosome, whether DDX6 is required for deadenylation was not investigated. Given that the CCR4-NOT deadenylate complex directly interacts with the empty ribosome E-site in yeast and humans (Buschauer et al., 2020 PMID: 32299921 and Absmeier et al., 2023 PMID: 37653243), whether the loss of DDX6 also affected the action of the CCR4-NOT complex is an important point to investigate, or at least should be discussed in this paper.

      We sincerely appreciate the reviewer's valuable suggestions. This point is indeed crucial, and we have addressed it in the revised version of our manuscript. We have included experimental results confirming that the knockout of DDX6 does not impact the CCR4-NOT complex’s deadenylation function. This addition will contribute to a more comprehensive discussion of the relevant issues and refine our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors should explain what they use to determine rare codons in their system and distinguish this feature from codon optimality. Codon optimality is a distinct feature from rare codon usage, and both should be defined better in the context of the paper. The authors interchange between the use of codon optimality, rare codon usage, and translation stalling sequences frequently and should explain and clarify these terms or consider only referring to translation stalling sequences for their discussion.

      We appreciate the reviewer's valuable feedback, we have been able to improve the clarity and rigor of the relevant statements in the manuscript. In the revised manuscript, we have provided more explicit and detailed explanations regarding the definition and use of rare codons, and differentiated this from codon optimality, in order to help readers better understand the basis of our analysis and research findings. Furthermore, in the revised manuscript, we are now referring exclusively to 'translation stalling sequences' in our discussion, in order to provide greater clarity.

      Reviewer #2 (Recommendations For The Authors):

      Interestingly, the translation efficiency of zinc-finger domain mRNAs was increased in DDX6 KO cells. This finding is consistent with the previous study reporting that mRNAs encoding zinc-finger domains are enriched with non-optimal codons and unstable. (Diez et al., 2022 PMID: 35840631). The authors might want to cite this paper and mention the consistency of the two studies.

      Thank you for noting the relevance of the increased translation efficiency of zinc-finger domain mRNAs in DDX6 KO cells. We will reference the study by Diez et al. (2022) and emphasize the consistency between their findings and ours, which supports the idea that DDX6 is involved in regulating the translation of mRNAs with these characteristics.

      A mutagenesis analysis of the poly-basic residues of BMP2 would further strengthen the authors' claim that this sequence is a primal cause of ribosome slowdown and mRNA decay.

      We greatly appreciate the reviewer’s suggestion to conduct a mutagenesis analysis of the poly-basic residues of BMP2. We agree that such an analysis could potentially strengthen our claim. However, considering the constraints we are currently encountering, and our study has already provided substantial evidence to support our findings, we believe that at this stage of our research, conducting this analysis may not be the most immediate priority. We will consider undertaking a mutagenesis analysis in future studies to further validate our conclusions.

      In the Introduction, RQC is not commonly referred to as "ribosome-based quality control." Please consider the use of "ribosome-associated quality control."

      We appreciate the reviewer providing this suggestion. During the revision process, we corrected the relevant terminology to ensure more precise and appropriate usage.

      In the Introduction, the authors should avoid introducing NMD as a part of RQC. NMD was discovered and defined independently of RQC.

      Thank you for pointing out this important distinction. We recognize that NMD was discovered and defined independently from RQC, and should not be presented as an integral part of the RQC process. In the revised manuscript, we have made sure to avoid introducing nonsense-mediated decay (NMD) as a component of ribosome-associated quality control (RQC).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1:

      Detection of early-stage colorectal cancer is of great importance. Laboratory scientists and clinicians have reported different exosomal biomarkers to identify colorectal cancer patients. This is a proof-of-principle study of whether exosomal RNAs, and particularly predicted lncRNAs, potential biomarkers of early-stage colorectal cancer and its precancerous lesions.

      Strengths:

      The study provides a valuable dataset of the whole-transcriptomic profile of circulating sEVs, including miRNA, mRNA, and lncRNA. This approach adds to the understanding of sEV-RNAs' role in CRC carcinogenesis and facilitates the discovery of potential biomarkers.

      The developed 60-gene t-SNE model successfully differentiated T1a stage CRC/AA from normal controls with high specificity and sensitivity, indicating the potential of sEV-RNAs as diagnostic markers for early-stage colorectal lesions.

      The study combines RNA-seq, RT-qPCR, and modelling algorithms to select and validate candidate sEV-RNAs, maximising the performance of the developed RNA signature. The comparison of different algorithms and consideration of other factors enhance the robustness of the findings.

      Weaknesses:

      Validation in larger cohorts would be required to establish as biomarkers, and to demonstrate whether the predicted lncRNAs implicated in these biomarkers are indeed present, and whether they are robustly predictive/prognostic.

      Thank you for your careful evaluation and valuable suggestions, which have provided valuable guidance for the improvement of our paper. In response to your feedback, we have implemented the following improvements.

      (1) More detail about how lncRNA and miRNA candidates were defined, and how this compares to previously published miRNA and lncRNA predictions. The Suppl Methods section for lncRNAs does not describe in detail how the "CPC/CNCI/Pfam" "methods" were combined to define lncRNAs here.

      Author response and action taken: Thanks for your comments. In the Supplementary Methods section titled " Selection of Predictive Biomarkers", we have provided a more detailed illustration regarding the screening process for candidate RNA biomarkers. The revised section is as follows: To ensure the predictive performance of the sEV-RNA signature, candidate sEV-RNAs were ultimately selected based on their fold change in colorectal cancer/ precancerous advanced adenoma, absolute abundance, and module attribution. In detail, we initially selected the top 10 RNAs from each category (mRNA, miRNA, and lncRNA) with a fold change greater than 4. In cases where fewer than 10 RNAs were meeting this criterion, all RNAs with a fold change greater than 4 were included. Subsequently, we filtered out RNAs with low abundance, and we selected the top-ranked RNAs from each module based on the fold change ranking for inclusion in the final model.

      Compared to most previous studies on EV biomarkers, the overall discriminative performance of the biomarker model we constructed is considerable, holding clinical value for practical application. In contrast, the supplementary merit of this study lies in uncovering the heterogeneity at the whole transcriptome level among samples of different categories, providing a more comprehensive insight into the dynamic changes of biological states. For instance, we inferred the cell subtypes of EV origins through ssGSEA and correlated them with the tumor microenvironment status. The regulatory relationships among different RNA categories were delineated, and their impacts on biological signaling pathways were analyzed, a feat challenging to accomplish solely through sequencing of a single RNA category.

      In the Supplementary Methods section titled " Identification of mRNAs and lncRNAs", we have provided a more detailed explanation regarding how the "CPC/CNCI/Pfam" methods were combined to define lncRNAs. The revised section is as follows: Three computational approaches including CPC (Coding Potential Calculator)/CNCI (Coding-Non-Coding Index)/Pfam were combined to sort non-protein coding RNA candidates from putative protein-coding RNAs in the unknown transcripts. CPC is a sequence alignment-based tool used to assess protein-coding capacity. By aligning transcripts with known protein databases, CPC evaluates the biological sequence characteristics of each coding frame of the transcript to determine its coding potential and identify non-coding RNAs.1 CNCI analysis is a method used to distinguish between coding and non-coding transcripts based on adjacent nucleotide triplets. This tool does not rely on known annotation files and can effectively predict incomplete transcripts and antisense transcript pairs.2 Pfam divides protein domains into different protein families and establishes statistical models for the amino acid sequences of each family through protein sequence alignment.3 Transcripts that can be aligned are considered to have a certain protein domain, indicating coding potential, while transcripts without alignment results are potential lncRNAs. Putative protein-coding RNAs were filtered out using a minimum length and exon number threshold. Transcripts above 200 nt with more than two exons were selected as lncRNA candidates and further screened by CPC/CNCI/Pfam. We distinguished lncRNAs from protein-coding genes by intersecting the results of the three determination methods mentioned above.

      (2) The role and function of many lncRNAs are unknown, and some lncRNA species may simply be the product of pervasive transcription. Although this is an exploratory and descriptive study of potential biomarkers, it would benefit from some discussion of potential mechanisms because the proposed prediction models include lncRNAs. Do the authors have a hypothesis as to why lncRNAs were informative and predictive in this study? Are these lncRNAs well-studied and/or known to be functional? Or are they markers for pervasive transcription, for example?

      Author response and action taken: Thanks for your comments. Whole transcriptome sequencing results facilitate the discussion of regulatory mechanisms between different biomarkers, supplying evidence for future investigations. Among the three lncRNAs involved in this study, lnc-MKRN2-42:1 is involved in the occurrence and development of Parkinson's disease4. The other two lncRNAs, however, lack relevant reports. Therefore, we cannot confirm that these lncRNAs have specific biological functions. In the Supplementary Methods section titled " Identification of mRNAs and lncRNAs", we acknowledge the limited understanding of sEV-lncRNAs in current research. In contrast, many miRNAs in the model have been proven to participate in the occurrence and development of colorectal cancer, such as miR-36155, miR-425-5p6, and miR-106b-3p7. These data provide biological support for the performance of the model, which is particularly valuable for model prediction.

      (3) In the Results section "Cell-specific features of the sEV-RNA profile indicated the different proportion of cells of sEV origin among different groups", the sEV-RNA profiles were correlated with existing transcriptome profiles from specific cell types (ssGSEA) and used to estimate "tumour microenvironment-associated scores". This transcriptomic correlation is a valuable observation, but there is no further evidence provided that the sEV-RNAs profiles truly reflect differential cell types of sEV origin between the sample subgroups.

      Could the authors clarify the strength of evidence for the cells-of-origin estimates, which are based only on sEV-RNA transcriptome profiles? Would sEV-RNA-derived cells-of-origin be expected to correlate with histopath-derived scores (tumour microenvironment; immune infiltrate) for example? Or is this section intended as an exploratory description of sEV-RNAs, perhaps a check on the plausibility of the sEV-RNA profiles, rather than an accurate estimation of cells-of-origin in each subgroup?

      Author response: Thanks for your comments. This section explores the proportional distribution of EVs from different cellular subgroups solely based on transcriptome profiles and algorithms, rather than providing precise estimates of cellular origins within each subgroup.

      (4) Software and R package version numbers should be provided.

      Author response and action taken: Thanks for your comments. We have added version information for relevant R packages at the first mention in the original text (e.g., WGCNA (version 1.61), Rtsne (version 0.15), GSVA (version 1.42.0), ESTIMATE (version 1.0.13), DOSE (version 3.8.0)).

      References

      (1) Kong L, et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345-349 (2007).

      (2) Sun L, et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 41, e166 (2013).

      (3) Finn RD, et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222-230 (2014).

      (4) Wang Q, et al. Integrated analysis of exosomal lncRNA and mRNA expression profiles reveals the involvement of lnc-MKRN2-42:1 in the pathogenesis of Parkinson's disease. CNS Neurosci Ther. 26, 527-537 (2020).

      (5) Zheng G, et al. Identification and validation of reference genes for qPCR detection of serum microRNAs in colorectal adenocarcinoma patients. PLoS One. 8, e83025 (2013).

      (6) Liu D, Zhang H, Cui M, Chen C, Feng Y. Hsa-miR-425-5p promotes tumor growth and metastasis by activating the CTNND1-mediated β-catenin pathway and EMT in colorectal cancer. Cell Cycle. 19, 1917-1927 (2020).

      (7) Liu H, et al. Colorectal cancer-derived exosomal miR-106b-3p promotes metastasis by down-regulating DLC-1 expression. Clin Sci (Lond). 134, 419-434 (2020).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      In this study, Ger and colleagues present a valuable new technique that uses recurrent neural networks to distinguish between model misspecification and behavioral stochasticity when interpreting cognitivebehavioral model fits. Evidence for the usefulness of this technique, which is currently based primarily on a relatively simple toy problem, is considered incomplete but could be improved via comparisons to existing approaches and/or applications to other problems. This technique addresses a long-standing problem that is likely to be of interest to researchers pushing the limits of cognitive computational modeling.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ger and colleagues address an issue that often impedes computational modeling: the inherent ambiguity between stochasticity in behavior and structural mismatch between the assumed and true model. They propose a solution to use RNNs to estimate the ceiling on explainable variation within a behavioral dataset. With this information in hand, it is possible to determine the extent to which "worse fits" result from behavioral stochasticity versus failures of the cognitive model to capture nuances in behavior (model misspecification). The authors demonstrate the efficacy of the approach in a synthetic toy problem and then use the method to show that poorer model fits to 2-step data in participants with low IQ are actually due to an increase in inherent stochasticity, rather than systemic mismatch between model and behavior.

      Strengths:

      Overall I found the ideas conveyed in the paper interesting and the paper to be extremely clear and wellwritten. The method itself is clever and intuitive and I believe it could be useful in certain circumstances, particularly ones where the sources of structure in behavioral data are unknown. In general, the support for the method is clear and compelling. The flexibility of the method also means that it can be applied to different types of behavioral data - without any hypotheses about the exact behavioral features that might be present in a given task.

      Thank you for taking the time to review our work and for the positive remarks regarding the manuscript. Below is a point-by-point response to the concerns raised.

      Weaknesses:

      That said, I have some concerns with the manuscript in its current form, largely related to the applicability of the proposed methods for problems of importance in computational cognitive neuroscience. This concern stems from the fact that the toy problem explored in the manuscript is somewhat simple, and the theoretical problem addressed in it could have been identified through other means (for example through the use of posterior predictive checking for model validation), and the actual behavioral data analyzed were interpreted as a null result (failure to reject that the behavioral stochasticity hypothesis), rather than actual identification of model-misspecification. I expand on these primary concerns and raise several smaller points below.

      A primary question I have about this work is whether the method described would actually provide any advantage for real cognitive modeling problems beyond what is typically done to minimize the chance of model misspecification (in particular, post-predictive checking). The toy problem examined in the manuscript is pretty extreme (two of the three synthetic agents are very far from what a human would do on the task, and the models deviate from one another to a degree that detecting the difference should not be difficult for any method). The issue posed in the toy data would easily be identified by following good modeling practices, which include using posterior predictive checking over summary measures to identify model insufficiencies, which in turn would call for the need for a broader set of models (See Wilson & Collins 2019). Thus, I am left wondering whether this method could actually identify model misspecification in real world data, particularly in situations where standard posterior predictive checking would fall short. The conclusions from the main empirical data set rest largely on a null result, and the utility of a method for detecting model misspecification seems like it should depend on its ability to detect its presence, not just its absence, in real data.

      Beyond the question of its advantage above and beyond data- and hypothesis-informed methods for identifying model misspecification, I am also concerned that if the method does identify a modelinsufficiency, then you still would need to use these other methods in order to understand what aspect of behavior deviated from model predictions in order to design a better model. In general, it seems that the authors should be clear that this is a tool that might be helpful in some situations, but that it will need to be used in combination with other well-described modeling techniques (posterior predictive checking for model validation and guiding cognitive model extensions to capture unexplained features of the data). A general stylistic concern I have with this manuscript is that it presents and characterizes a new tool to help with cognitive computational modeling, but it does not really adhere to best modeling practices (see Collins & Wilson, eLife), which involve looking at data to identify core behavioral features and simulating data from best-fitting models to confirm that these features are reproduced. One could take away from this paper that you would be better off fitting a neural network to your behavioral data rather than carefully comparing the predictions of your cognitive model to your actual data, but I think that would be a highly misleading takeaway since summary measures of behavior would just as easily have diagnosed the model misspecification in the toy problem, and have the added advantage that they provide information about which cognitive processes are missing in such cases.

      As a more minor point, it is also worth noting that this method could not distinguish behavioral stochasticity from the deterministic structure that is not repeated across training/test sets (for example, because a specific sequence is present in the training set but not the test set). This should be included in the discussion of method limitations. It was also not entirely clear to me whether the method could be applied to real behavioral data without extensive pretraining (on >500 participants) which would certainly limit its applicability for standard cases.

      The authors focus on model misspecification, but in reality, all of our models are misspecified to some degree since the true process-generating behavior almost certainly deviates from our simple models (ie. as George Box is frequently quoted, "all models are wrong, but some of them are useful"). It would be useful to have some more nuanced discussion of situations in which misspecification is and is not problematic.

      We thank the reviewer for these comments and have made changes to the manuscript to better describe these limitations. We agree with the reviewer and accept that fitting a neural network is by no means a substitute for careful and dedicated cognitive modeling. Cognitive modeling is aimed at describing the latent processes that are assumed to generate the observed data, and we agree that careful description of the data-generating mechanisms, including posterior predictive checks, is always required. However, even a well-defined cognitive model might still have little predictive accuracy, and it is difficult to know how much resources should be put into trying to test and develop new cognitive models to describe the data. We argue that RNN can lead to some insights regarding this question, and highlight the following limitations that were mentioned by the review: 

      First, we accept that it is important to provide positive evidence for the existence of model misspecification. In that sense, a result where the network shows dramatic improvement over the best-fitting theoretical model is easier to interpret compared to when the network shows no (or very little) improvement in predictive accuracy. This is because there is always an option that the network, for some reason, was not flexible enough to learn the data-generating model, or because the data-generating mechanism has changed from training to test. We have now added this more clearly in the limitation section. However, when it comes to our empirical results, we would like to emphasize that the network did in fact improve the predictive accuracy for all participants. The result shows support in favor of a "null" hypothesis in the sense that we seem to find evidence that the change in predictive accuracy between the theoretical model and RNN is not systematic across levels of IQ. This allows us to quantify evidence (use Bayesian statistics) for no systematic model misspecification as a function of IQ. While it is always possible that a different model might systematically improve the predictive accuracy of low vs high IQ individuals' data, this seems less likely given the flexibility of the current results.  

      Second, we agree that our current study only applies to the RL models that we tested. In the context of RL, we have used a well-established and frequently applied paradigm and models. We emphasize in the discussion that simulations are required to further validate other uses for this method with other paradigms.  

      Third, we also accept that posterior predictive checks should always be capitalized when possible, which is now emphasized in the discussion. However, we note that these are not always easy to interpret in a meaningful way and may not always provide details regarding model insufficiencies as described by the reviewer. It is very hard to determine what should be considered as a good prediction and since the generative model is always unknown, sometimes very low predictive accuracy can still be at the peak of possible model performance. This is because the data might be generated from a very noisy process, capping the possible predictive accuracy at a very low point. However, when strictly using theoretical modeling, it is very hard to determine what predictive accuracy to expect. Also, predictive checks are not always easy to interpret visually or otherwise. For example, in two-armed bandit tasks where there are only two actions, the prediction of choices is easier to understand in our opinion when described using a confusion matrix that summarizes the model's ability to predict the empirical behavior (which becomes similar to the predictive estimation we describe in eq 22).  

      Finally, this approach indeed requires a large dataset, with at least three sessions for each participant (training, validation, and test). Further studies might shed more light on the use of optimal epochs as a proxy for noise/complexity that can be used with less data (i.e., training and validation, without a test set).

      Please see our changes at the end of this document.  

      Reviewer #2 (Public Review):

      SUMMARY:

      In this manuscript, Ger and colleagues propose two complementary analytical methods aimed at quantifying the model misspecification and irreducible stochasticity in human choice behavior. The first method involves fitting recurrent neural networks (RNNs) and theoretical models to human choices and interpreting the better performance of RNNs as providing evidence of the misspecifications of theoretical models. The second method involves estimating the number of training iterations for which the fitted RNN achieves the best prediction of human choice behavior in a separate, validation data set, following an approach known as "early stopping". This number is then interpreted as a proxy for the amount of explainable variability in behavior, such that fewer iterations (earlier stopping) correspond to a higher amount of irreducible stochasticity in the data. The authors validate the two methods using simulations of choice behavior in a two-stage task, where the simulated behavior is generated by different known models. Finally, the authors use their approach in a real data set of human choices in the two-stage task, concluding that low-IQ subjects exhibit greater levels of stochasticity than high-IQ subjects.

      STRENGTHS:

      The manuscript explores an extremely important topic to scientists interested in characterizing human decision-making. While it is generally acknowledged that any computational model of behavior will be limited in its ability to describe a particular data set, one should hope to understand whether these limitations arise due to model misspecification or due to irreducible stochasticity in the data. Evidence for the former suggests that better models ought to exist; evidence for the latter suggests they might not.

      To address this important topic, the authors elaborate carefully on the rationale of their proposed approach. They describe a variety of simulations - for which the ground truth models and the amount of behavioral stochasticity are known - to validate their approaches. This enables the reader to understand the benefits (and limitations) of these approaches when applied to the two-stage task, a task paradigm commonly used in the field. Through a set of convincing analyses, the authors demonstrate that their approach is capable of identifying situations where an alternative, untested computational model can outperform the set of tested models, before applying these techniques to a realistic data set.

      Thank you for reviewing our work and for the positive tone. Please find below a point-by-point response to the concerns you have raised.

      WEAKNESSES:

      The most significant weakness is that the paper rests on the implicit assumption that the fitted RNNs explain as much variance as possible, an assumption that is likely incorrect and which can result in incorrect conclusions. While in low-dimensional tasks RNNs can predict behavior as well as the data-generating models, this is not *always* the case, and the paper itself illustrates (in Figure 3) several cases where the fitted RNNs fall short of the ground-truth model. In such cases, we cannot conclude that a subject exhibiting a relatively poor RNN fit necessarily has a relatively high degree of behavioral stochasticity. Instead, it is at least conceivable that this subject's behavior is generated precisely (i.e., with low noise) by an alternative model that is poorly fit by an RNN - e.g., a model with long-term sequential dependencies, which RNNs are known to have difficulties in capturing.

      These situations could lead to incorrect conclusions for both of the proposed methods. First, the model misspecification analysis might show equal predictive performance for a particular theoretical model and for the RNN. While a scientist might be inclined to conclude that the theoretical model explains the maximum amount of explainable variance and therefore that no better model should exist, the scenario in the previous paragraph suggests that a superior model might nonetheless exist. Second, in the earlystopping analysis, a particular subject may achieve optimal validation performance with fewer epochs than another, leading the scientist to conclude that this subject exhibits higher behavioral noise. However, as before, this could again result from the fact that this subject's behavior is produced with little noise by a different model. Admittedly, the existence of such scenarios *in principle* does not mean that such scenarios are common, and the conclusions drawn in the paper are likely appropriate for the particular examples analyzed. However, it is much less obvious that the RNNs will provide optimal fits in other types of tasks, particularly those with more complex rules and long-term sequential dependencies, and in such scenarios, an ill-advised scientist might end up drawing incorrect conclusions from the application of the proposed approaches.

      Yes, we understand and agree. A negative result where RNN is unable to overcome the best fitting theoretical model would always leave room for doubt regarding the fact that a different approach might yield better results. In contrast, a dramatic improvement in predictive accuracy for RNN is easier to interpret since it implies that the theoretical model can be improved. We have made an effort to make this issue clear and more articulated in the discussion. We specifically and directly mention in the discussion that “Equating RNN performance with the generative model should be avoided”.   

      However, we would like to note that our empirical results provided a somewhat more nuanced scenario where we found that the RNN generally improved the predictive accuracy of most participants. Importantly, this improvement was found to be equal across participants with no systematic benefits for low vs high IQ participants. We understand that there is always the possibility that another model would show a systematic benefit for low vs. high IQ participants, however, we suggest that this is less likely given the current evidence. We have made an effort to clearly note these issues in the discussion.  

      In addition to this general limitation, the paper also makes a few additional claims that are not fully supported by the provided evidence. For example, Figure 4 highlights the relationship between the optimal epochs and agent noise. Yet, it is nonetheless possible that the optimal epoch is influenced by model parameters other than inverse temperature (e.g., learning rate). This could again lead to invalid conclusions, such as concluding that low-IQ is associated with optimal epoch when an alternative account might be that low-IQ is associated with low learning rate, which in turn is associated with optimal epoch. Yet additional factors such as the deep double-descent (Nakkiran et al., ICLR 2020) can also influence the optimal epoch value as computed by the authors.

      An additional issue is that Figure 4 reports an association between optimal epoch and noise, but noise is normalized by the true minimal/maximal inverse-temperature of hybrid agents (Eq. 23). It is thus possible that the relationship does not hold for more extreme values of inverse-temperature such as beta=0 (extremely noisy behavior) or beta=inf (deterministic behavior), two important special cases that should be incorporated in the current study. Finally, even taking the association in Figure 4 at face value, there are potential issues with inferring noise from the optimal epoch when their correlation is only r~=0.7. As shown in the figures, upon finding a very low optimal epoch for a particular subject, one might be compelled to infer high amounts of noise, even though several agents may exhibit a low optimal epoch despite having very little noise.

      Thank you for these comments. Indeed, there is much we do not yet fully understand about the factors that influence optimal epochs. Currently, it is clear to us that the number of optimal epochs is influenced by a variety of factors, including network size, the data size, and other cognitive parameters, such as the learning rate. We hope that our work serves as a proof-of-concept, suggesting that, in certain scenarios, the number of epochs can be utilized as an empirical estimate. Moreover, we maintain that, at least within the context of the current paradigm, the number of optimal epochs is primarily sensitive to the amount of true underlying noise, assuming the number of trials and network size are constant. We are therefore hopeful that this proofof-concept will encourage research that will further examine the factors that influence the optimal epochs in different behavioral paradigms.  

      To address the reviewer's justified concerns, we have made several amendments to the manuscript. First, we added an additional version of Figure 4 in the Supplementary Information material, where the noise parameter values are not scaled. We hope this adjustment clarifies that the parameters were tested across a broad spectrum of values (e.g., 0 to 10 for the hybrid model), spanning the two extremes of complete randomness and high determinism. Second, we included a linear regression analysis showing the association of all model parameters (including noise) with the optimal number of epochs. As anticipated by the reviewer, the learning rate was also found to be associated with the number of optimal epochs. Nonetheless, the noise parameter appears to maintain the most substantial association with the number of optimal epochs. We have also added a specific mentioning of these associations in the discussion, to inform readers that the association between the number of optimal epochs and model parameters should be examined using simulation for other paradigms/models. Lastly, we acknowledge in the discussion that the findings regarding the association between the number of optimal epochs and noise warrant further investigation, considering other factors that might influence the determination of the optimal epoch point and the fact that the correlation with noise is strong, but not perfect (in the range of 0.7).

      The discussion now includes the following:

      “Several limitations should be considered in our proposed approach. First, fitting a data-driven neural network is evidently not enough to produce a comprehensive theoretical description of the data generation mechanisms. Currently, best practices for cognitive modeling \citep{wilson2019ten} require identifying under what conditions the model struggles to predict the data (e.g., using posterior predictive checks), and describing a different theoretical model that could account for these disadvantages in prediction. However, identifying conditions where the model shortcomings in predictive accuracy are due to model misspecifications rather than noisier behavior is a challenging task. We propose leveraging data-driven RNNs as a supplementary tool, particularly when they significantly outperform existing theoretical models, followed by refined theoretical modeling to provide insights into what processes were mis-specified in the initial modeling effort.

      Second, although we observed a robust association between the optimal number of epochs and true noise across varying network sizes and dataset sizes (see Fig.~\ref{figS2}), additional factors such as network architecture and other model parameters (e.g., learning rate, see .~\ref{figS7}) might influence this estimation. Further research is required to allow us to better understand how and why different factors change the number of optimal epochs for a given dataset before it can be applied with confidence to empirical investigations. 

      Third, the empirical dataset used in our study consisted of data collected from human participants at a single time point, serving as the training set for our RNN. The test set data, collected with a time interval of approximately $\sim6$ and $\sim18$ months, introduced the possibility of changes in participants' decision-making strategies over time. In our analysis, we neglected any possible changes in participants' decision-making strategies during that time, changes that may lead to poorer generalization performance of our approach. Thus, further studies are needed to eliminate such possible explanations.

      Fourth, our simulations, albeit illustrative, were confined to known models, necessitating in-silico validation before extrapolating the efficacy of our approach to other model classes and tasks. Our aim was to showcase the potential benefits of using a data-driven approach, particularly when faced with unknown models. However, whether RNNs will provide optimal fits for tasks with more complex rules and long-term sequential dependencies remains uncertain.

      Finally, while positive outcomes where RNNs surpass theoretical models can prompt insightful model refinement, caution is warranted in directly equating RNN performance with that of the generative model, as seen in our simulations (e.g., Figure 3). We highlight that our empirical findings depict a more complex scenario, wherein the RNN enhanced the predictive accuracy for all participants uniformly. Notably, we also provide evidence supporting a null effect among individuals, with no consistent difference in RNN improvement over the theoretical model based on IQ. Although it remains conceivable that a different datadriven model could systematically heighten the predictive accuracy for individuals with lower IQs in this task, such a possibility seems less probable in light of the current findings.”

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Is the t that gets fed as input to RNN just timestep?

      t = last transition type (rare/common). not timestep

      Line 378: what does "optimal epochs" mean here?

      The number of optimal training epochs that minimize both underfitting and overfitting (define in the line ~300)

      Line 443: I don't think "identical" is the right word here - surely the authors just mean that there is not an obvious systematic difference in the distributions.

      Fixed

      I was expecting to see ~500 points in Figure 7a, but there seem to be only 50... why weren't all datasets with at least 2 sessions used for this analysis?

      We used the ~500 subjects (only 2 datasets) to pre-train the RNN, and then fine-tuned the pre-trained RNN on the other 54 subjects that have 3 datasets. The correlation of IQ and optimal epoch also hold for the 500 subjects as shown below. 

      Author response image 1.

      Reviewer #2 (Recommendations For The Authors):

      Figure 3b: despite spending a long time trying to understand the meaning of each cell of the confusion matrix, I'm still unsure what they represent. Would be great if you could spell out the meaning of each cell individually, at least for the first matrix in the paper.

      We added a clarification to the Figure caption. 

      Figure 5: Why didn't the authors show this exact scenario using simulated data? It would be much easier to understand the predictions of this figure if they had been demonstrated in simulated data, such as individuals with different amounts of behavioral noise or different levels of model misspecifications.

      In Figure 5 the x-axis represents IQ. Replacing the x-axis with true noise would make what we present now as Figure 4. We have made an effort to emphasize the meaning of the axes in the caption. 

      Line 195 ("...in the action selection. Where"). Typo? No period is needed before "where".

      Fixed

      Line 213 ("K dominated-hand model"). I was intrigued by this model, but wasn't sure whether it has been used previously in the literature, or whether this is the first time it has been proposed.

      This is the first time that we know of that this model is used.  

      Line 345 ("This suggests that RNN is flexible enough to approximate a wide range of different behavioral models"): Worth explaining why (i.e., because the GRUs are able to capture dependencies across longer delays than a k-order Logistic Regression model).

      Line 356 ("We were interested to test"): Suggestion: "We were interested in testing".

      Fixed

      Line 389 ("However, as long as the number of observations and the size of the network is the same between two datasets, the number of optimal epochs can be used to estimate whether the dataset of one participant is noisier compared with a second dataset."): This is an important claim that should ideally be demonstrated directly. The paper only illustrates this effect through a correlation and a scatter plot, where higher noise tends to predict a lower optimal epoch. However, is the claim here that, in some circumstances, optimal epoch can be used to *deterministically* estimate noise? If so, this would be a strong result and should ideally be included in the paper.

      We have now omitted this sentenced and toned down our claims, suggesting that while we did find a strong association between noise and optimal epochs, future research is required to established to what extent this could be differentiated from other factors (i.e., network size, amount of observations).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important new insights into how multisensory information is processed in the lateral cortex of the inferior colliculus, a poorly understood part of the auditory midbrain. By developing new imaging techniques that provide the first optical access to the lateral cortex in a living animal, the authors provide convincing in vivo evidence that this region contains separate subregions that can be distinguished by their sensory inputs and neurochemical profiles, as suggested by previous anatomical and in vitro studies. Additional information and analyses are needed, however, to allow readers to fully appreciate what was done, and the comparison of multisensory interactions between awake and anesthetized mice would benefit from being explored in more detail.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors provide a characterisation of auditory responses (tones, noise, and amplitude-modulated sounds) and bimodal (somatosensory-auditory) responses and interactions in the higher-order lateral cortex (LC) of the inferior colliculus (IC) and compare these characteristics with the higher order dorsal cortex (DC) of the IC - in awake and anaesthetised mice. Dan Llano's group has previously identified gaba'ergic patches (modules) in the LC distinctly receiving inputs from somatosensory structures, surrounded by matrix regions receiving inputs from the auditory cortex. They here use 2P calcium imaging combined with an implanted prism to - for the first time - get functional optical access to these subregions (modules and matrix) in the lateral cortex of IC in vivo, in order to also characterise the functional difference in these subparts of LC. They find that both DC and LC of both awake and anaesthetised mice appear to be more responsive to more complex sounds (amplitude-modulated noise) compared to pure tones and that under anesthesia the matrix of LC is more modulated by specific frequency and temporal content compared to the gabaergic modules in LC. However, while both LC and DC appear to have low-frequency preferences, this preference for low frequencies is more pronounced in DC. Furthermore, in both awake and anesthetized mice, somatosensory inputs are capable of driving responses on their own in the modules of LC, but very little (possibly not at all) in the matrix. However, bimodal interactions may be different under awake and anesthesia in LC, which warrants deeper investigation by the authors: They find, under anesthesia, more bimodal enhancement in modules of LC compared to the matrix of LC and bimodal suppression dominating the matrix of LC. In contrast, under awake conditions bimodal enhancement is almost exclusively found in the matrix of LC, and bimodal suppression dominates both matrix and modules of LC.

      The paper provides new information about how subregions with different inputs and neurochemical profiles in the higher-order auditory midbrain process auditory and multisensory information, and is useful for the auditory and multisensory circuits neuroscience community.

      Strengths:

      The major strength of this study is undoubtedly the fact that the authors for the first time provide optical access to a subcortical region (the lateral cortex of the inferior colliculus (i.e. higher order auditory midbrain)) which we know (from previous work by the same group) have optically identifiable subdivisions with unique inputs and neurotransmitter release, and plays a central role in auditory and multisensory processing. A description of basic auditory and multisensory properties of this structure is therefore very useful for understanding auditory processing and multisensory interactions in subcortical circuits.

      Weaknesses:

      I have divided my comments about weaknesses and improvements into major and minor comments. All of which I believe are addressable by the reviewers to provide a more clear picture of their characterisation of the higher-order auditory midbrain.

      Major comment:

      (1) The differences between multisensory interactions in LC in anaesthetised and awake preparations appear to be qualitatively different, though the authors claim they are similar (see also minor comment related to figure 10H for further explanation of what I mean). However, the findings in awake and anaesthetised conditions are summarised differently, and plotting of similar findings in the awake figures and anaesthetised figures are different - and different statistics are used for the same comparisons. This makes it very difficult to assess how multisensory integration in LC is different under awake and anaesthetised conditions. I suggest that the authors plot (and test with similar statistics) the summary plots in Figure 8 (i.e. Figure 8H-K) for awake data in Figure 10, and also make similar plots to Figures 10G-H for anaesthetised data. This will help the readers understand the differences between bimodal stimulation effects on awake and anaesthetised preparations - which in its current form, looks very distinct. In general, it is unclear to me why the awake data related to Figures 9 and 10 is presented in a different way for similar comparisons. Please streamline the presentation of results for anaesthetised and awake results to aid the comparison of results in different states, and explicitly state and discuss differences under awake and anaesthetised conditions.

      We thank the reviewer for the valuable suggestion. We only highlighted the similarities between the data obtained from anesthetized and awake preparations to indicate the ability to reproduce the technique in awake animals for future assessment. Identifying those similarities between the two experimental setups was based on the comparison between modules vs matrix or LC vs DC within each experimental setup (awake vs anesthetized). Therefore, the statistics were chosen differently for each setup based on the size of the subjects (n) within each experimental preparation. However, we agree with the reviewer’s comment that there are differences between the anesthetized and awake data. To examine these differences, we ran the same statistics for Figure 5 (tonotopy of LC vs. DC-anesthetic animals) and Figure 9 (tonotopy of LC vs DC-awake animals). In addition, we added a new figure after Figure 9 to separate the statistical analysis from the maps. Accordingly, Figures 4 and 5 (maps and analysis, respectively -anesthetized animals) now match Figures 9 and 10 (maps and analysis, respectively – awake animals). We also did the same thing for Figures 7 (microprism imaging of the LC - anesthetized animals), 8 (imaging of the LC from the dorsal surface - anesthetized animals) as well as Figure 11 or old Figure 10 (microprism imaging of the LC - awake animals) to address the similarities and differences of the multisensory data between awake and anesthetized animals. We edited the text accordingly in the result and discussion sections.

      (2) The claim about the degree of tonotopy in LC and DC should be aided by summary statistics to understand the degree to which tonotopy is actually present. For example, the authors could demonstrate that it is not possible/or is possible to predict above chance a cell's BF based on the group of other cells in the area. This will help understand to what degree the tonotopy is topographic vs salt and pepper. Also, it would be good to know if the gaba'ergic modules have a higher propensity of particular BFs or tonotopic structure compared to matrix regions in LC, and also if general tuning properties (e.g. tuning width) are different from the matrix cells and the ones in DC.

      Thank you for the reviewer’s suggestion. We have examined the tonotopy of LC and DC using two regression models (linear and quadratic polynomial) between the BFs of the cells and their location on the anatomical axis. Therefore, the tonotopy is indicated by a significant regression fit with a high R2 between the BFs the cells, and their location within each structure. For the DC, there was a significant regression fit between the BFs of the cells and their locations over the rostromedial to the caudolateral axis. Additionally, the R2 of the quadratic polynomial fit was higher than that of the linear fit, which indicates a nonlinear distribution of cells based on their BFs, which is consistent with the presence of high-low-high tuning over the DC surface. Given that the microprism cannot image the whole area of the LC, and it images a slightly different area in each animal, it was very difficult to get a consistent map for the LC as well as a solid conclusion about the LC tonotopy. However, we have examined the regression fit between the BFs of cells and their location along the main four anatomical axes of the field of view obtained from each animal (dorsal to ventral), (rostral to caudal), (dorsocaudal to ventrorostral) (dorsorostral to ventrocoudal). Unlike the DC, the LC imaged via microprism showed a lower R2 for both linear and quadratic regression mostly in the dorsoventral axis. We show the fitting curves of these regressions in Figure 4-figure supplement 1 (anesthetized data) and Figure 9-figure supplement 1 (awake data). Despite the inconsistent tonotopy of the LC imaged via microprism, the modules were found to have a higher BFs median at 10 kHz compared to matrix that had a lower BFs median at 7.1 kHz, which was consistent across the anesthetized and awake animals. We have added these results in the corresponding spot in the results section (lines 193-197 and 361-364). We have examined the tuning width using the binarized receptive field sum (RFS) method in which each neuron was given a value of 1 if it responds to a single frequency (Narrow RF), but this value increases if the neuron responds to more neighbor frequencies (wide RF). We did this calculation across all the sound levels. Both DC and LC of the anesthetized animals had higher RFS mean and median than those of awake animals given that ketamine was known to broaden the RF. However, in both preparations (anesthetized and awake), the DC had a higher RFS mean than that of the LC, which could be consistent with the finding that the DC had a relatively lower SMI than the LC. To show these new data, we made a new Figure 10-figure supplement 1, and we edited the text accordingly [lines 372-379 & 527-531].

      (3) Throughout the paper more information needs to be given about the number of cells, sessions, and animals used in each panel, and what level was used as n in the statistical tests. For example, in Figure 4 I can not tell if the 4 mice shown for LC imaging are the only 4 mice imaged, and used in the Figure 4E summary or if these are just examples. In general, throughout the paper, it is currently not possible to assess how many cells, sessions, and animals the data shown comes from.

      Thank you for the reviewer’s comment. We do apologize for not adding this information. We added all the information regarding the size of the statistical subjects (number of cells or number of animals used) for every test outcome. To keep the flow of the text, we added the details of the statistical tests in the legends of the figures.

      (4) Throughout the paper, to better understand the summary maps and plots, it would be helpful to see example responses of the different components investigated. For example, given that module cells appear to have more auditory offset responses, it would be helpful to see what the bimodal, sound-only, and somatosensory responses look like in example cells in LC modules. This also goes for just general examples of what the responses to auditory and somatosensory inputs look like in DC vs LC. In general example plots of what the responses actually look like are needed to better understand what is being summarised.

      Thank you for the reviewer’s comment and suggestion. We modified Figure 6 and the text accordingly to include all the significant examples of cells discussed throughout the work.

      Reviewer #2 (Public Review):

      Summary:

      The study describes differences in responses to sounds and whisker deflections as well as combinations of these stimuli in different neurochemically defined subsections of the lateral and dorsal cortex of the inferior colliculus in anesthetised and awake mice.

      Strengths:

      The main achievement of the work lies in obtaining the data in the first place as this required establishing and refining a challenging surgical procedure to insert a prism that enabled the authors to visualise the lateral surface of the inferior colliculus. Using this approach, the authors were then able to provide the first functional comparison of neural responses inside and outside of the GABA-rich modules of the lateral cortex. The strongest and most interesting aspects of the results, in my opinion, concern the interactions of auditory and somatosensory stimulation. For instance, the authors find that a) somatosensory-responses are strongest inside the modules and b) somatosensory-auditory suppression is stronger in the matrix than in the modules. This suggests that, while somatosensory inputs preferentially target the GABA-rich modules, they do not exclusively target GABAergic neurons within the modules (given that the authors record exclusively from excitatory neurons we wouldn't expect to see somatosensory responses if they targeted exclusively GABAergic neurons), and that the GABAergic neurons of the modules (consistent with previous work) preferentially impact neurons outside the modules, i.e. via long-range connections.

      Weaknesses:

      While the findings are of interest to the subfield they have only rather limited implications beyond it. The writing is not as precise as it could be. Consequently, the manuscript is unclear in some places. For instance, the text is somewhat confusing as to whether there is a difference in the pattern (modules vs matrix) of somatosensory-auditory suppression between anesthetized and awake animals. Furthermore, there are aspects of the results which are potentially very interesting but have not been explored. For example, there is a remarkable degree of clustering of response properties evident in many of the maps included in the paper. Taking Figure 7 for instance, rather than a salt and pepper organization we can see auditory responsive neurons clumped together and non-responsive neurons clumped together and in the panels below we can see off-responsive neurons forming clusters (although it is not easy to make out the magenta dots against the black background). This degree of clustering seems much stronger than expected and deserves further attention.

      Thank you for the reviewer’s comment. We do apologize if some areas in the manuscript were imprecisely written. For anesthetized and awake data, we have only emphasized the similarities between the two setups to show the ability to use microprism in awake animals for future assessment. To highlight the differences between anesthetized and awake animals, we have now run uniform statistics for all the data collected from both setups. Accordingly, we have edited Figures 4 and 5 (tonotopy-anesthetized) to match Figures 9 and new Figure 10 (tonotopy-awake). Also, we edited Figures 7 and 8 (multisensory- anesthetized) to match Figure 11 or old Figure 10 (multisensory- awake). We edited the text accordingly in the results section and discussed the possible differences between anesthetized and awake data in the discussion section [lines 521-553].

      We agree with the reviewer’s comment that the cells were topographically clustered based on their responses. Some of these clusters include the somatosensory responsive cells, which were located mostly in the modules (Figures 7D and 8E). Also, the auditory responsive cells with offset responses were clustered mostly in the modules (Figures 7C and 8F). Accordingly, we have edited the text to emphasize this finding.

      We noticed also that some responsive cells to the tested stimulations were surrounded by nonresponsive cells. By comparing the response of the cells to different stimuli we found that while Figures 7 and 11 (old Figure 10) showed only the response of the cells to auditory stimulation (unmodulated broadband noise at 80 dB) and somatosensory stimulation (whisker deflection), some nonresponsive cells to these specific stimulations were found to be responsive to pure tones of different frequencies and amplitudes. As an indicator of the cells' viability, we additionally examined the spontaneous activity of the nonresponsive cells across different data sets. We note that spontaneous activity was rare for all cells even among the responsive cells to sound or somatosensory stimulations. This finding could be related to the possibility that the 2P imaging of calcium signals may not be sensitive enough to track spontaneous activity that may originate from single spikes. However, in some data sets, we have found that the cells that did not respond to any tested stimuli showed spontaneous activity when no stimulation was given indicating the viability of those cells. We have addressed the activity of the non-responsive cells in the text along with a new Figure 11-figure supplement 1.

      We changed the magenta into a green color to be suitable for the dark background. Also, we have completely changed the color palette of all of our images to be suitable for color-blind readers as suggested by reviewer 1.

      Reviewer #3 (Public Review):

      The lateral cortex of the inferior colliculus (LC) is a region of the auditory midbrain noted for receiving both auditory and somatosensory input. Anatomical studies have established that somatosensory input primarily impinges on "modular" regions of the LC, which are characterized by high densities of GABAergic neurons, while auditory input is more prominent in the "matrix" regions that surround the modules. However, how auditory and somatosensory stimuli shape activity, both individually and when combined, in the modular and matrix regions of the LC has remained unknown.

      The major obstacle to progress has been the location of the LC on the lateral edge of the inferior colliculus where it cannot be accessed in vivo using conventional imaging approaches. The authors overcame this obstacle by developing methods to implant a microprism adjacent to the LC. By redirecting light from the lateral surface of the LC to the dorsal surface of the microprism, the microprism enabled two-photon imaging of the LC via a dorsal approach in anesthetized and awake mice. Then, by crossing GAD-67-GFP mice with Thy1-jRGECO1a mice, the authors showed that they could identify LC modules in vivo using GFP fluorescence while assessing neural responses to auditory, somatosensory, and multimodal stimuli using Ca2+ imaging. Critically, the authors also validated the accuracy of the microprism technique by directly comparing results obtained with a microprism to data collected using conventional imaging of the dorsal-most LC modules, which are directly visible on the dorsal IC surface, finding good correlations between the approaches.

      Through this innovative combination of techniques, the authors found that matrix neurons were more sensitive to auditory stimuli than modular neurons, modular neurons were more sensitive to somatosensory stimuli than matrix neurons, and bimodal, auditory-somatosensory stimuli were more likely to suppress activity in matrix neurons and enhance activity in modular neurons. Interestingly, despite their higher sensitivity to somatosensory stimuli than matrix neurons, modular neurons in the anesthetized prep were far more responsive to auditory stimuli than somatosensory stimuli (albeit with a tendency to have offset responses to sounds). This suggests that modular neurons should not be thought of as primarily representing somatosensory input, but rather as being more prone to having their auditory responses modified by somatosensory input. However, this trend was reversed in the awake prep, where modular neurons became more responsive to somatosensory stimuli than auditory stimuli. Thus, to this reviewer, the most intriguing result of the present study is the dramatic extent to which neural responses in the LC changed in the awake preparation. While this is not entirely unexpected, the magnitude and stimulus specificity of the changes caused by anesthesia highlight the extent to which higher-level sensory processing is affected by anesthesia and strongly suggest that future studies of LC function should be conducted in awake animals.

      Together, the results of this study expand our understanding of the functional roles of matrix and module neurons by showing that responses in LC subregions are more complicated than might have been expected based on anatomy alone. The development of the microprism technique for imaging the LC will be a boon to the field, finally enabling much-needed studies of LC function in vivo. The experiments were well-designed and well-controlled, and the limitations of two-photon imaging for tracking neural activity are acknowledged. Appropriate statistical tests were used. There are three main issues the authors should address, but otherwise, this study represents an important advance in the field.

      (1) Please address whether the Thy1 mouse evenly expresses jRGECO1a in all LC neurons. It is known that these mice express jRGECO1a in subsets of neurons in the cerebral cortex, and similar biases in the LC could have biased the results here.

      Thank you for the reviewer’s comment. In the work published by Dana, et al, the expression of jRGECO1a in all Thy1 mouse lines was determined by the brightness of the jRGECO1a in the soma. Given that some cells do not show a detected level of jRGECO1a fluorescence until activated, the difference in expression shown in different brain regions could be related to the level of neuronal activity at the time of sample processing and not the expression levels of the indicator itself. To the best of our knowledge, there is no antibody for jRGECO1a, which can be used for detecting the expression levels of the indicator regardless of the neuronal activity. To test the hypothesis that DC and LC have different levels of jRGECO1a, we examined the expression levels of jRGECO1a after we perfused the mice with high potassium saline to elicit a general neuronal depolarization in the whole brain. Then we immunostained against NeuN (the neuronal marker) to quantify the percentage of the neurons expressing jRGECO1a to the total number of neurons (indicated by NeuN). To have a fair comparison, we restricted our analysis to include the areas imaged only by 2P as some regions were not accessible by microprism such as the deep ventral regions of the LC. There is a similar % of cells expressing jRGECO1a in DC and LC. As expected, the neurons expressing jRGECO1a were only nonGABAergic cells. We addressed these findings in the new Figure 3-figure Supplement 1 as well as the corresponding text in the results [lines 178-184] and methods sections [lines 878-892].

      (2) I suggest adding a paragraph or two to the discussion to address the large differences observed between the anesthetized and awake preparations. For example, somatosensory responses in the modules increased dramatically from 14.4% in the anesthetized prep to 63.6% in the awake prep. At the same time, auditory responses decreased from 52.1% to 22%. (Numbers for anesthetized prep include auditory responses and somatosensory + auditory responses.). In addition, the tonotopy of the DC shifted in the awake condition. These are intriguing changes that are not entirely expected from the switch to an awake prep and therefore warrant discussion.

      Thank you for the reviewer’s comment. To determine if differences exist between anesthetized and awake data, we have now used the same statistics and edited Figures 4,5,7,8,9, and 10 as well as added a new Figure 11. Accordingly, we have edited the result section and added a paragraph addressing the possible differences between the two preparations in the Discussion section [lines 521-553]..

      (3) For somatosensory stimuli, the authors used whisker deflection, but based on the anatomy, this is presumably not the only somatosensory stimulus that affects LC. The authors could help readers place the present results in a broader context by discussing how other somatosensory stimuli might come into play. For example, might a larger percentage of modular neurons be activated by somatosensory stimuli if more diverse stimuli were used?

      We agree with the reviewer’s point. Indeed, the modules are receiving different inputs from different somatosensory sources such as somatosensory cortex and dorsal column nuclei, which could indicate that the activity of the cells in the modular areas could be evoked by different types of somatosensory stimulations, which is an open area for future studies. We have discussed this point in the revised Discussion section [lines 516-520].

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      (1) Figure 3H: The lateral surface seems quite damaged by the prism. An example slice of the imaging area of each mouse would help the reader better understand the extent of damage the prism leaves in the area of interest.

      Thank you for the reviewer’s comment. We already have included such images in Figures 4A, 7A, and 9A to present the field of view of all prism experiments. However, we need to clarify the point of tissue damage. The insertion of microprism may be associated with some tissue damage as a result of making the pocket for the microprism to be inserted, but it is not possible to get neuronal signals from a damaged field of view. Therefore, we do not believe that there is tissue damage to the parts of the LC imaged by microprism. However, there may be some areas where the microprism is not in direct contact with the LC surface. These areas are located mostly in the periphery of the field of view, and they are completely black as they are out of focus (i.e., the left side of Figure 3B). The right side of Figure 3b as well as Figure 3A have some black areas, which present the vasculatures, where there are no red signals because of the lack of jRGECO1a expression in those areas.

      (2) In relation to the data shown in Figure 4E it is claimed that LC is tuned to higher frequencies (lines 195-196). However, the majority of cells appear to be tuned to frequencies below 14kHz (with a median of 7.5 kHz), which is quite low for the mouse. I assume that the authors mean frequencies that are relatively higher than the DC, but it is worth mentioning in the text that the BFs found in the LC are quite low-frequency responses for the mouse.

      Thank you for the reviewer’s comment, which we agree with. We edited this part by acknowledging that around 50% of the LC cells had a low-frequency bias to 5 and 7.1 kHz. Then we mentioned that most of the LC cells are tuned to relatively higher frequencies than those of the DC [lines 215-218].

      (3) Figure 5A-C: Is it the tone-responsive cells plus an additional ~22% of cells that respond to AM, or are there also cells that respond to tones that do not respond to AM. Please break down to which degree the tone and AM responsive cells are overlapping.

      Thank you for the reviewer’s comment and suggestion. We broke down the responsive cells into cells responsive only to pure tone (tone selective cells or Tone-sel) or to only AM-noise (noise selective cells or Noise-sel) as well as cells responding to both sounds (nonselective cells or Non-sel). We examined the fractions of these categories of cells in both LC and DC within all responsive neurons. Accordingly, we have edited Figure 5A-C as well as the text [lines 229-243].

      (4) Figure 5D. It is unclear to me how a cell is classified as SMI or TMI responsive after computing the SMI or TMI for each cell. What statistic was used to determine if the cell was responsive or not?

      Thank you for the reviewer’s comment. We do apologize for the confusion caused by Figures 5D and E. These figures do not show the values of SMI or TMI, respectively. Rather, the figures show the percentage of the spectrally or temporally modulated cells, respectively. At each sound level, the cells were categorized into two main types. The spectrally modulated cells are those responsive to pure tones or unmodulated noise, so they can detect the spectral features of the sound (old Figure 5D or new Figure 5E). The temporally modulated cells are those responsive to AM-noise, so they can detect the temporal features of the sound of complex spectra like the broadband noise (old Figure 5E or new Figure 5F). To clear this confusion, we removed the words SMI and TMI from the figures, and then we renamed the x-axis label into “% of spectrally modulated cells” and “% of temporally modulated cells” for Figures 5D (new 5E) and E (new 5F), respectively.

      (5) Figure 5 D, E: Is the decrease in SMI and TMI modulated cells in the modules a result of simply lower sensitivity to sounds (i.e. higher response thresholds)? If a cell responds to neither tone, AM, or noise it will have a low SMI and TMI index. If this is the case that affects the interpretation, as it is then not a decrease in sensitivity to spectral or temporal modulation, but instead a difference in overall sound sensitivity.

      Thank you for the reviewer’s comment. We apologize for the confusion about Figures 5E and D, which did not show the SMI and TMI values. Rather, they show the percentage of spectrally or temporally modulated cells, respectively, as explained in our previous response. Therefore, Figure 5D shows the percentage of cells that can detect the spectral features of sound, while Figure 5E shows the percentage of cells that can detect the temporal features of sounds of complex spectra like broadband noise. Accordingly, Figures 5D and E show the sensitivity to different features of sound and not the overall sound sensitivity.

      (6) Figure 7 and 8: What is the false positive rate expected of the responsive cells using the correlation cell flagging criteria? Especially given that the fraction of cells responsive to somatosensory stimulation in LC (matrix) is 0.88% and 1.3% in DC, it is important to know what the expected false positive rate is in order to be able to state that there are actually somatosensory responses there or if this is what you would expect from false positives given the inclusion test used. Please provide an estimate of the false positive rate given your inclusion test and show that the rate found is statistically significantly above that level - and show this rate with a line in Figure 7 H, I.

      Thank you for the reviewer’s comment. To test the efficiency of the correlation method to determine the responsive cells, we initially ran an ROC curve comparing the automated method to a blinded human interpretation. The AUC of the ROC curve was 0.88. This high AUC value indicates that the correlation method can rank the random responsive cells than the random nonresponsive cells. At the correlation coefficient (0.4), which was the cutoff value to determine the responsive cells for somatosensory stimulation, the specificity was 87% and the sensitivity 72%, the positive predictive value was 73%, and the negative predictive value was 86%. Although the above percentages indicate the efficiency of the correlation method, we excluded all the false responsive cells from the analysis. Therefore, the fractions of cells in the graphs are the true responsive cells with no contamination of the non-responsive cells. We also modified Figures 7H and I to match the other data sets obtained from awake animals. Therefore, Figures 7H and I no longer show the average of the responsive cells. Instead, they show the % of different fractions of responsive cells within each cellular motif (modules and matrix). Accordingly, we believe that there is no need to include a rate line on the graph. We added the section describing the validation part to the methods section [lines 808-815].

      (7) Figure 7: Please clarify what is meant by a cell responding to 'both responding to somatosensory and auditory stimulation'. Does it mean that the cell has responses to both auditory and somatosensory stimulation when presented individually or if it responds to both presented together? If it is the former, I don't understand how the number to both can be higher than the number of somatosensory alone (as both requires it also to respond to somatosensory alone). If it is the latter (combined auditory and somatosensory) then it seems that somatosensory inputs remove the responsiveness of most cells that were otherwise responsive to auditory alone (e.g. in the module while 42% respond to sound alone, combined stimulation would leave only 10% of cells responsive). Please clarify what exactly the authors are plotting and stating here.

      Thank you for the reviewer’s comment. The responsive cells in Figure 7 are divided into three categories. Each category has a completely different group of cells. The first category is for the cells responding only to auditory stimulation (auditory-selective cells or Aud-sel). The second category is for the cells that respond only to somatosensory stimulation (somatosensory selective cells or Som-sel). The third category is for the cells that respond to both auditory and somatosensory stimulations when both stimulations are presented individually (auditory/somatosensory nonselective cells or Aud/Som-nonsel). Accordingly, the number of cells may be different across all these categories. We have clarified this part in the text [lines 299-303]. We have modified Figures 7, 8, and 11 (old Figure 10) to match the data from anesthetized and awake animals, so Figures 7H and I now show the collective % of the cells from all animals within modules vs matrix.

      (8) Why are the inferential statistics used in Figure 9F (chi-square test) and Figure 5A-C (t-test) when it tests the same thing (the only difference is one is anaesthetised data and the other awake)? Indeed, all Figure 9 and 10 (awake data figures) plots use chi-square tests to test differences in percentages instead of t-tests used in earlier (anaesthetised data figures) plots to test differences in percentages between groups. Please clarify the reason for this change in statistics used for similar comparisons.

      Thank you for the reviewer’s comment. Imaging the LC via microprism from awake animals confirmed the ability to run this technique with no interference to the ambulatory functions of the animals. Therefore, the main goal was to highlight the similarities between the data obtained from awake and anesthetized setups by highlighting the comparison between the LC and DC or between modules and matrix within each preparation (anesthetized vs awake). Accordingly, the statistics used to run these comparisons were chosen based on the number of the tested animals at each setup (7 anesthetized animals and 3 awake animals for prism insertion). The low number of animals used for awake data made us use the number of cells collectively from all animals instead of the number of animals, so we used the Chi-square test to examine the differences in percentages.

      (9) Figure 10H: The main text describes the results shown here as similar to what was seen in anaesthetised animals. But it looks to me like the results in awake animals are qualitatively different from the multisensory interaction seen in anaesthetised animals. In anaesthetised animals the authors find that there is a higher chance of auditory responses being enhanced by somatosensory inputs when cells are in the modules compared to in the matrix. However, in awake data, this relationship is flipped, with more bimodal enhancement found in the matrix compared to the modules. Furthermore, almost all cells in the modules are suppressed by combined somatosensory input which looks like it is different from what is found in anaesthestised mice and what is described in the discussion: 'we observed that combined auditory-somatosensory stimulation generally suppressed neural responses to auditory stimuli and that this suppression was most prominent in the LC matrix'.

      Thank you for the reviewer’s comment. Our statement was meant to show how the data obtained from awake and anesthetized animals were generally similar. However, we agree that the statement may not be suitable due to the possible differences between awake and anesthetized animals. To address a fair comparison between the anesthetized and awake preparations, we ran similar statistics and graphs for Figures 7, 8, and 11 (old Figure 10). Given that the areas occupied by modules and matrix are different across animals due to the irregular shape of the modules, we chose to run a chi-square test for all the data to quantify the collective % of responding cells within modules vs matrix from all tested animals for each experimental setup (anesthetized vs awake). The anesthetized and awake animals similarly showed that modules and matrix had higher fractions of auditory responsive cells. However, matrix had more cells responding to auditory stimulations than modules, while modules had more cells responding to somatosensory stimulation than matrix. In contrast, while the anesthetized animals showed higher fractions of offset auditory-responsive cells, which were mostly clustered in the modules, the offset auditory-responsive cells were very rare in awake animals (6 cells/one animal).

      Based on the fractions of cells with suppressed or enhanced auditory response induced by bimodal stimulation, the data obtained from anesthetized and awake animals showed that the auditory response in the matrix was suppressed more than enhanced by bimodal stimulation. In contrast, modules had different profiles across the experimental setups and locations. For instance, the modules imaged via microprism in the anesthetized and awake animals showed suppressed more than enhanced auditory responses, but modules imaged from the dorsal surface in anesthetized animals showed enhanced more than suppressed auditory responses. Additionally, modules had less suppressed and more enhanced auditory responses compared to matrix in the anesthetized animals regardless of the location of the modules (microprism or dorsal surface). Yet, modules from awake animals had more suppressed and less enhanced auditory responses compared to matrix. We have addressed these differences in the results and discussion section.

      Additional minor comments that I think the authors could use to aid their manuscript clarity:

      (1) The figure colour selection - especially in Figures 7 and 8 - is really hard to tell apart. Please choose more distinct colours, and a colour scheme that is appropriate for colour blind readers.

      Thank you for the reviewer’s suggestion. We have noticed that the magenta color assigned for the cells with offset responses was very difficult to distinguish from the black background. We have changed the magenta color to green to be different from the color of other cells. Using Photoshop, we chose a color scheme that is suitable for color-blind readers in all our maps.

      (2) The sentence in lines 331-334 should be rephrased for clarity.

      Thank you for the reviewer’s suggestion. We have rephrased the statement for clarity [lines 364-371].

      Reviewer #2 (Recommendations For The Authors):

      As mentioned in the public review the strong clustering evident in some of the maps (some of which may be related to module/matrix differences but certainly not all of it) seems worth scrutinizing further. Would we expect such a strong spatial segregation of auditory responsive and non-responsive neurons? Would we expect response properties (e.g. off-responsiveness) other than frequency tuning to show evidence of a topographic arrangement in the IC? In addressing this it would, of course, be important to rule out that this clustering is not down to some trivial experimental variables and truly reflects functional organization. For instance, are the patches of non-responsive neurons found in parts of the field of view with poor visibility, poor labelling, etc which may explain why it is difficult to pick up responses there? Are the neurons in non-responsive areas otherwise active (i.e. do they show spontaneous activity) or could they be 'dead'? Could the way neuropil signals are dealt with play a role here (it is weighted by 0.4 which strikes me as quite low)? In relation to this, I am also wondering to what extent the extreme overrepresentation (Figure 4) of neurons with a BF of 5kHz (some of this is, of course, down to the fact that the lower end of the frequency range was 5kHz and that the step size was 0.5 octaves), especially in the DC, is to be interpreted.

      Thank you for the reviewer’s comment. Before analysis, the ROIs of all cells were set around the cell bodies using the jRGECO1a signals as a reference, so all cells (responsive and nonresponsive) were collected from areas of good visibility of jRGECO1a signals. In other words, no cells were collected from regions having poor jRGECO1a signals. In Figures 7, 8, and 11 (old Figure 10), the cells showed response either only to unmodulated broadband noise at 80 dB as an auditory stimulus or to whisker deflection with specific speed and power as a somatosensory stimulus. Given that the two stimuli above had specific parameters, the remaining non-responsive cells may respond to auditory or somatosensory stimulations with other features. For instance, some nonresponsive cells to the unmodulated broadband noise were responding to pure tones with different amplitudes and frequencies or to different AM-noise with different amplitudes and modulation frequencies.  Also, these nonresponsive cells may not respond to any of our tested stimuli and may respond to other sensory stimulations. Some of the non-responsive cells showed spontaneous activity when no stimulations were presented. However, we can not rule out the possibility that some of these nonresponsive cells may not be viable. We have addressed the clustering properties in the revised version of the manuscript in the corresponding spots of the results and discussion sections. We have added a new supplementary figure (Figure 11- Figure Supplement 1) to show how the nonresponsive cells to the unmodulated noise may respond to other types of sound and to show the spontaneous activity of some non-responsive cells.

      For the neuropil, previous reports used the contamination factor (r) in a range of 0.3-0.7 (we referenced these studies in the method section [line 776) based on the tissue or cells imaged, vasculatures, and the objective used for imaging. Therefore, we optimized the contamination factor (r) to be 0.4 through a preliminary analysis based on the tissue we image (LC), and the objective used (16x with NA = 0.8 and 3 mm as a working distance).

      We agree that there is an overrepresentation of 5 kHz as the best tuning frequency for DC cells. The previous report (A. B. Wong & Borst, 2019) showed a large zone of the DC where cells were tuned to (2-8 kHz). Given that 5kHz was the lowest tested frequency in our experiment, we think that the low-frequency bias of the DC surface is consistent between studies. This finding also could be supported by the electrophysiology data obtained by spanning the recording electrodes through the IC tissue along the dorsoventral axis. In those experiments, the cells were tuned to lower frequencies at the dorsal surface of the IC.

      We have changed the magenta-colored cells to green ones, so it will be easier to identify the cells. As required by another reviewer, we changed the color pallets of some images and cellular maps to be suitable for color-blind readers. 

      The manuscript would benefit from more precise language in a number of places, especially in the results section.

      Line 220/221, for instance: "... a significant fraction of cells that did not respond to pure tones did respond to AM-noise" Strictly speaking, this sentence suggests that you considered here only the subset of neurons that did not respond to pure tones and then ran a test on that subset. The test that was done seems to suggest though that the authors tested whether the percentage of responsive cells was greater for pure tones or for AM noise.

      Thank you for the reviewer’s comment. We do apologize for the confusion. In the revised manuscript, we categorized the cells according to their response into cells responding to pure tone only (tone-selective cells or Tone-sel), Am-noise only (noise-selective cells or Nose-sel), and to both pure tone and am-noise (nonselective cells or Non-sel). We have modified Figure 5 accordingly. We did the same thing for the data obtained from awake animals and showed that in a new figure to easily match the analysis done for the anesthetized animals.

      Please refer to the figure panels in the text in consecutive order. 2B, for instance, is mentioned after 2H.

      Thank you for the reviewer’s comment. Throughout the paper, we kept the consecutive order of the figure panels within each figure to be in a smooth flow with the text. Yet, figure 2 was just the only exception for a good reason. Figure 2 is a complex one that includes many panels to show a parallel comparison between LC imaged via microprism and DC through single photon images, two-photon images, validating laser lesioning, and histology. Accordingly, we navigated many panels of the figure to efficiently highlight the aspects of this comparison. We prefer to keep Figure 2 as one figure with its current format to show this parallel comparison between LC and DC.

      The legend for Figure 2 could be clearer. For instance, there are two descriptions for panel D. Line 1009: "(C-E)" [i.e. C, D, E] and line 1010: "(D and F)".

      Thank you for the reviewer’s comment. It should be C and E, not C-E. We have fixed the mistake [line 1224]

      Line 275: What does 'with no preference' mean?

      Thank you for the reviewer’s comment. We do apologize for the confusion. There are three categories of cells. Some cells respond only to auditory stimulation, while others respond to only somatosensory stimulation. However, there is another group of cells that respond nonselectively to auditory and somatosensory stimulations or Aud/Som-nonsel cells. We edited the sentence to be clearer [lines 303-304].

      Line 281 (and other places): What does 'normalized against modules' mean?

      Thank you for the reviewer’s comment. This normalization was done by dividing the number of responsive cells of the same response type in the matrix by that in the modules. Therefore, the value taken by modules was always 1 and the value taken by the matrix is something around 1. Accordingly, the value for matrix could be > 1 if matrix had more cells than modules. In contrast, the value of matrix would be < 1 if matrix had fewer cells than modules. In the revised version, we used this normalization method to make the revised Figures 5C and 10C to describe the cell fractions responding to pure tone only, AM-noise only, or to both stimuli in the matrix vs modules. 

      Sentence starting on line 288. I don't find that point to be as obvious from the figures as the sentences seem to suggest. Are we to compare magenta points (auditory off cells) from 7C with green points in 7F?

      Thank you for the reviewer’s comment. We came to this conclusion based on our visual comparison of magenta points (now green in the revised version to increase the visibility) representing the auditory offset cells in Figure 7C and the green points in Figure 7F representing the cells responding to both somatosensory and auditory stimulations. In the revised manuscript, we statistically examined if the percentage of onset auditory response and offset auditory responses are different within the responsive cells to both somatosensory and auditory stimulations in the modules vs matrix. We have found that most of the cells responding to both somatosensory and auditory stimulations inside the modules had offset auditory responses, which could indicate a level of multisensory integration between somatosensory input and the offset auditory responses in these cells. We have added the statistical results to the revised manuscript to address this effect [lines 312-317]

      Lines 300-302: "These data suggest that the module/matrix system permits preservation of distinct multimodal response properties in the face of massive integration of inputs in the LC". First, I'm not quite sure what that sentence means. Second, it would be more appropriate for the discussion. Third, the fact that we are more likely to find response enhancement in the modules than in the matrix is nicely consistent with the idea (supported by work from the senior author's lab and others) that excitatory somatosensory input predominantly targets neurons in the modules (which is why we see mostly response enhancement in the modules) and that this input targets GABAergic neurons which then project to and inhibit neurons both outside and inside of their module. Therefore, I would recommend that the authors replace the aforementioned sentence with one that interprets these results in light of what we know about this somatosensory-auditory circuitry.

      Thank you for the reviewer’s comment. Despite the massive multimodal inputs, the LC receives from auditory vs nonauditory regions, the module/matrix system is a platform for distinct multimodal responses indicated by more somatosensory responsive cells in modules versus more auditory responsive cells in matrix, which matches the anatomical differences that were reported before. We edited the sentence in the light of the comparison between the data obtained from awake and anesthetized animals and moved it to the discussion section [lines 503-506].

      The term 'LC imaged via microprism' is used dozens of times throughout the manuscript. Replacing it with a suitable acronym or initialism could improve the flow of the text and would make some of the sentences less cumbersome.

      Thank you for the reviewer’s suggestion. We changed the term “LC imaged via microprism” into LC (microprism) throughout the revised manuscript.

      5A-C: It is unclear what is being compared here. What are the Ns? Different animals?

      Thank you for the reviewer’s comment. We do apologize for this missing information. We have added the number of subjects used in every statistical test in each corresponding figure legend.

      5G: minus symbol missing on the y-axis.

      Thank you for the reviewer’s comment. We gladly have fixed that.

      Figure 6: Are these examples or population averages?

      Thank you for the reviewer’s question. Every figure panel of the old Figure 6 represents a single trace of an example cell. However, we modified Figure 6 to include more examples of cells showing different responses complying with another reviewer’s suggestion. Each panel of the new Figure 6 represents the average response of 5 stimulations of the corresponding stimulus type. We preferred to show the average signal because it was the one used for the subsequent analysis.

      How are module borders defined?

      Thank you for the reviewer’s question. The modules were defined based on the intensity of the green channel that shows the expression of the GFP signals. The boundaries of modules were determined according to the distinction between high and low GFP signal boundaries of the modules. This step was done before data analysis to avoid any bias.

      7JKL: How are these to be interpreted? Does panel 7K, for instance, indicate that the fraction of neurons showing 'on' responses was roughly twice as large in the matrix than in the modules and that the fraction of neurons showing 'off' responses was roughly 10 times larger in the modules than in the matrix (the mean seems to be at about 1/10).

      Thank you for the reviewer’s comment. The data represented by Figures 7J-L defined the normalization of the number of cells of the same response type in the matrix against the modules. This normalization was done per animal, and then the data of the matrix were plotted against the normalization line at 1 representing the modules. The matrix will be claimed to have more cells than modules if the median of the matrix values > 1. In contrast, the matrix will be claimed to have fewer cells than the modules if the median of the matrix values < 1. Finally, if the median of matrix values = 1, this means there is no difference between matrix and modules. However, to match the data obtained from anesthetized animals (Figures 7 and 8) with those obtained from awake animals (Figure 11 or old Figure 10), we ran all data through the Chi-square test in the revised manuscript. Therefore, the format of Figures 7K-L was changed in the revised manuscript, so they became new Figures 7I-K.

      10A suggests that significantly more than half the neurons shown here are not auditory responsive. Perhaps I am misinterpreting something here but isn't that in contrast to what is shown in panel 9F?

      Thank you for the reviewer’s comment. The data shown in Figure 10A (or revised Figure 11A) represents the cellular response to only one stimulus (broadband noise at 80 dB with no modulation frequency), while Figure 9F (revised 10B) represents the cells responding to varieties of auditory stimulations of different combinations of frequencies and amplitudes (pure tones) as well as to AM-noise of different amplitudes and modulation frequencies. Accordingly, the old Figure 9F or revised Figure 10B shows different cell types based on their responses. For instance, some cells respond only to pure tone. Others respond only to AM-noise or to both pure tones and AM-noise. This may also support that the nonresponsive cells in Figure 10A (revised 11A) can respond to other types of sound features.

      The way I understood panels 7L and 8K there were more suppressed neurons in the matrix than in the modules (line 296: "cells in the modules had a higher odds of having an enhancement response to bimodal stimulation than matrix, while cells in the matrix had a higher odds of having a suppressive response to bimodal stimulation"). Now, panel 10F indicates that in awake mice there is a greater proportion of suppressed neurons in the modules than in the matrix. I may very well have overlooked or misread something but I may not be the only reader confused by this so please clarify.

      Thank you for the reviewer’s comment. We do apologize for this confusion. The ambiguity between Figures 7 and 8 (anesthetized animals) as well as Figure 10 (awake animals) comes from the fact that different statistics have been used for each preparation. In the revised version, we have fixed that by running the same statistics for all the data, and we accordingly revised Figures 7, 8, and 10 (new Figure 11). In brief, the matrix preserves a higher percentage of cells with suppressed auditory responses than those with enhanced auditory responses induced by bimodal stimulation in all conditions (anesthetized vs awake). In contrast, modules act differently across all tested conditions. While modules had more cells with enhanced auditory responses induced by bimodal interaction in anesthetized animals, they had more cells with suppressed response in awake animals indicating that modules could be sensitive to the effect of anesthesia compared to matrix. We addressed this effect in the discussion of the revised manuscript [lines 521-553].

      Line 438: ...as early AS...

      Thank you for the reviewer’s comment. We gladly fixed that [line 512].  

      Reviewer #3 (Recommendations For The Authors):

      My minor recommendations for the authors are as follows:

      (1) The text can be a bit difficult to follow in places. This is partly due to the convoluted nature of the results, but I suggest a careful read-through to look for opportunities to improve the prose. In particular, there is a tendency to use long sentences and long paragraphs. For example, the third paragraph of the introduction runs for almost fifty lines.

      Thank you for the reviewer’s comment and suggestion. We have fixed that.

      (2) This might be due to journal compression, but some of the bar graphs in the figures are difficult to read. For example, the individual data points, especially when filled with striped background colors get lost. Axes can become invisible, like the y-axis in 7L, and portions of bars, like in 7F, are not always rendered correctly. Error bars are sometimes hidden behind data points, as in 5C. Increasing line thickness and shifting individual data points away from error bars might help with this.

      Thank you for the reviewer’s comment and suggestion. We made all the data points with black color and filled circles to make the data points visible. We put all the data points behind the main columns, so they don’t block the error bars. We have fixed figures 7 and 5.

      (3) Throughout the manuscript, the authors use a higher SMI to indicate a preference of cells for auditory stimuli with "greater spectral... complexity" (e.g., lines 219 and 372). I find this interpretation a bit challenging since SMI compares a neuron's preference for tones over noise, and to me, tones seem like the least spectrally complex of all auditory stimuli. Perhaps some clarification of what the authors mean by this would help. For example, is the assumption that a neuron that prefers tones over noise is, either directly or indirectly, receiving input sculpted by inhibitory processes?

      Thank you for the reviewer’s comment. In general, higher SMI values indicate an increase in the preference of the cells to respond to pure tones than noise with no modulation (less spectral complexity). We will clarify this statement throughout the manuscript. However, the SMI value was not mentioned in lines 219 and 372. The statement mentioned in line 219 describes the revised figure 5C (old 5B), where more cells in matrix specifically respond to AM-noise compared to modules, which indicates the preference of the matrix to respond to sounds of greater spectral and temporal complexity. The statement in 372 in the discussion section refers to the finding in revised figures 5E and F (old 5D and E). In the revised figure 5E or old 5D, the data show that matrix has more cells responding to pure tones or noise with no modulation than modules, so matrix has a lower threshold to detect the spectral features of sound (revised figure 5E or old 5D). In the revised figure 5F or old 5E, the data show that matrix has more cells responding to AM-noise than modules, which indicates that matrix functions more to process the temporal features of sound. As explained above, all findings were related to the percentage of cells responding to specific sound stimuli and not the exact SMI values. We have revised the figures accordingly by removing the terms SMI and TMI from the figures, and we have clarified that in the text.

      (4) Lines 250-253: How does a decrease in SMI correspond to "an increase in pure tone responsiveness?" Doesn't a decrease suggest the opposite?

      Thank you for the reviewer’s comment, which we agree with. We do apologize for that. We have fixed this statement [lines 275-277] and any related findings accordingly.

      (5) Line 304: Add "imaged via microprism" or similar after "response profiles with the LC.".

      Thank you for the reviewer’s suggestion. We have fixed that. However, we changed the term “LC imaged via microprism” into “LC(microprism)” for simplicity as suggested by another reviewer [line 330].

      (6) Figure 5A and C: Both plots show that more neurons responded to AM-noise than tones, but it would be interesting to know how much the tone-responsive and AM-noise responsive populations overlapped. Were all tone-responsive neurons also responsive to AM-noise?

      Thank you for the reviewer’s comment. We have categorized the cells based on their response to pure tone only, AM-only, and both pure tone and AM-noise when each stimulus is presented individually. We have modified Figures 5A and C, and they are now Figures 5B and D.

      (7) Figure 5G: Missing negative sign before "0.5.".

      Thank you for the reviewer’s suggestion. We gladly have fixed that. However, old Figure 5G became a revised Figure 5H.  

      (8) Figure 7 legend, Line 1102: Missing period after "(C and E)".

      Thank you for the reviewer’s suggestion. We think that the period should be placed before (C and E) at the end of “respectively”. The parentheses refer to the statements after them. We gladly fixed that. [line 1394]

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study reports that IT neurons have biased representations toward low spatial frequency

      (SF) and faster decoding of low SFs than high SFs. High SF-preferred neurons, and low SF-preferred neurons to a lesser degree, perform better category decoding than neurons with other profiles (U and inverted U shaped). SF coding also shows more sparseness than category coding in the earlier phase of the response and less sparseness in the later phase. The results are also contrasted with predictions of various DNN models.

      Strengths:

      The study addressed an important issue on the representations of SF information in a high-level visual area. Data are analyzed with LDA which can effectively reduce the dimensionality of neuronal responses and retain category information.

      We would like to express our sincere gratitude for your insightful and constructive comments which greatly contributed to the refinement of the manuscript. We appreciate the time and effort you dedicated to reviewing our work and providing suggestions. We have carefully considered each of your comments and addressed the suggested revisions accordingly.

      Weaknesses:

      The results are likely compromised by improper stimulus timing and unmatched spatial frequency spectrums of stimuli in different categories.

      The authors used a very brief stimulus duration (35ms), which would degrade the visual system's contrast sensitivity to medium and high SF information disproportionately (see Nachmias, JOSAA, 1967). Therefore, IT neurons in the study could have received more degraded medium and high SF inputs compared to low SF inputs, which may be at least partially responsible for higher firing rates to low SF R1 stimuli (Figure 1c) and poorer recall performance with median and high SF R3-R5 stimuli in LDA decoding. The issue may also to some degree explain the delayed onset of recall to higher SF stimuli (Figure 2a), preferred low SF with an earlier T1 onset (Figure 2b), lower firing rate to high SF during T1 (Figure 2c), somewhat increased firing rate to high SF during T2 (because weaker high SF inputs would lead to later onset, Figure 2d).

      We appreciate your concern regarding the course-to-fine nature of SF processing in the vision hierarchy and the short exposure time of our paradigm. According to your comment, we repeated the analysis of SF representation with 200ms exposure time as illustrated in Appendix 1 - Figure 4. Our recorded data contains the 200ms version of exposure time for all neurons in the main phase. As can be seen, the results are similar to what we found with 33 ms experiments.

      Next, we bring your attention to the following observations:

      (1) According to Figure 2d, the average firing rate of IT neurons for HSF could be higher than LSF in the late response phase. Therefore, the amount of HSF input received by the IT neurons is as much as LSF, however, its impact on the IT response is observable in the later phase of the response. Thus, the LSF preference is because of the temporal advantage of the LSF processing rather than contrast sensitivity.

      (2) According to Figure 3a, 6% of the neurons are HSF-preferred and their firing rate in HSF is comparable to the LSF firing rate in the LSF-preferred group. This analysis is carried out in the early phase of the response (70-170 ms). While most of the neurons prefer LSF, this observation shows that there is an HSF input that excites a small group of neurons. Furthermore, the highest separability index also belongs to the HSF-preferred profile in the early phase of the response which supports the impact of the HSF part of the input.

      (3) Similar LSF-preferred responses are also reported by Chen et al. (2018) (50ms for SC) and Zhang et al. (2023) (3.5 - 4 secs for V2 and V4) for longer duration times.

      Our results suggest that the LSF-preferred nature of the IT responses in terms of firing rate and recall, is not due to the weakness or lack of input source (or information) for HSF but rather to the processing nature of the SF in the vision hierarchy.

      To address this issue in the manuscript:

      Figure Appendix 1 - Figure 4 is added to the manuscript and shows the recall value and onset for R1-R5 with 200ms of exposure time.

      We added the following description to the discussion:

      “To rule out the degraded contrast sensitivity of the visual system to medium and high SF information because of the brief exposure time, we repeated the analysis with 200ms exposure time as illustrated in Appendix 1 - Figure 4 which indicates the same LSF-preferred results. Furthermore, according to Figure 2, the average firing rate of IT neurons for HSF could be higher than LSF in the late response phase. It indicates that the amount of HSF input received by the IT neurons in the later phase is as much as LSF, however, its impact on the IT response is observable in the later phase of the response. Thus, the LSF preference is because of the temporal advantage of the LSF processing rather than contrast sensitivity. Next, according to Figure 3(a), 6\% of the neurons are HSF-preferred and their firing rate in HSF is comparable to the LSF firing rate in the LSF-preferred group. This analysis is carried out in the early phase of the response (70-170ms). While most of the neurons prefer LSF, this observation shows that there is an HSF input that excites a small group of neurons. Additionally, the highest SI belongs to the HSF-preferred profile in the early phase of the response which supports the impact of the HSF part of the input. Similar LSF-preferred responses are also reported by Chen et. al. (2018) (50ms for SC) and Zhang et. al. (2023) (3.5 - 4 secs for V2 and V4). Therefore, our results show that the LSF-preferred nature of the IT responses in terms of firing rate and recall, is not due to the weakness or lack of input source (or information) for HSF but rather to the processing nature of the SF in the IT cortex.”

      Figure 3b shows greater face coding than object coding by high SF and to a lesser degree by low SF neurons. Only the inverted-U-shaped neurons displayed slightly better object coding than face coding. Overall the results give an impression that IT neurons are significantly more capable of coding faces than coding objects, which is inconsistent with the general understanding of the functions of IT neurons. The problem may lie with the selection of stimulus images (Figure 1b). To study SF-related category coding, the images in two categories need to have similar SF spectrums in the Fourier domain. Such efforts are not mentioned in the manuscript, and a look at the images in Figure 1b suggests that such efforts are likely not properly made. The ResNet18 decoding results in Figure 6C, in that IT neurons of different profiles show similar face and object coding, might be closer to reality.

      Because of the limited number of stimuli in our experiments, it is hard to discuss the category selectivity, which needs a higher number of stimuli. To overcome the limited number of stimuli in our experiment, we fixed 60% (nine out of 15 stimuli) while varying the remaining stimuli to reduce the selective bias. To check the coding capability of the IT neurons for face and non-face objects, we evaluated the recall of face vs. non-face classification in intact stimuli (similar to classifiers stated in the manuscript). Results show that at the population level, the recall value for objects is 90.45%, and for faces is 92.45%. However, the difference is not significant (p-value=0.44). On the other hand, we note that a large difference in the SI value does not translate directly to the classification accuracy, rather it illustrates the strength of representation.

      Regarding the SF spectrums, after matching the luminance and contrast of the images we matched the power of the images concerning SF and category. Powers are calculated using the sum of the absolute value of the Fourier transform of the image. Considering all stimuli, the ANOVA analysis shows that various SF bands have similar power (one-way ANOVA, p-value=0.24). Furthermore, comparing the power of faces and images in all SF bands (including intact) and both unscrambled and scrambled images indicates no significant difference between face and object (p-vale > 0.1). Therefore, the result of Figure 3b suggests that IT employs various SF bands for the recognition of various objects.

      Comparing the results of CNNs and IT shows that the CNNs do not capture the complexities of the IT cortex in terms of SF. One of the sources of this difference is because of the behavioral saliency of the face stimulus in the training of the primate visual system.

      To address this issue in the manuscript:

      The following description is added to the discussion:

      “… the decoding performance of category classification (face vs. non-face) in intact stimuli is 94.2%. The recall value for objects vs. scrambled is 90.45%, and for faces vs. scrambled is 92.45% (p-value=0.44), which indicates the high level of generalizability and validity characterizing our results.”

      The following description is added to the method section, SF filtering.

      “Finally, we equalized the stimulus power in all SF bands (intact, R-R5). The SF power among all conditions (all SF bands, face vs. non-face and unscrambled vs. scrambled) does not vary significantly (p-value > 0.1). SF power is calculated as the sum of the square value of the image coefficients in the Fourier domain.”

      Reviewer #2 (Public Review):

      Summary:

      This paper aimed to examine the spatial frequency selectivity of macaque inferotemporal (IT) neurons and its relation to category selectivity. The authors suggest in the present study that some IT neurons show a sensitivity for the spatial frequency of scrambled images. Their report suggests a shift in preferred spatial frequency during the response, from low to high spatial frequencies. This agrees with a coarse-to-fine processing strategy, which is in line with multiple studies in the early visual cortex. In addition, they report that the selectivity for faces and objects, relative to scrambled stimuli, depends on the spatial frequency tuning of the neurons.

      Strengths:

      Previous studies using human fMRI and psychophysics studied the contribution of different spatial frequency bands to object recognition, but as pointed out by the authors little is known about the spatial frequency selectivity of single IT neurons. This study addresses this gap and they show that at least some IT neurons show a sensitivity for spatial frequency and

      interestingly show a tendency for coarse-to-fine processing.

      We extend our sincere appreciation for your thoughtful and constructive feedback on our paper. We are grateful for the time and expertise you invested in reviewing our work. Your detailed suggestions have been instrumental in addressing several key aspects of the paper, contributing to its clarity and scholarly merit. We have carefully considered each of your comments and have made revisions accordingly.

      Weaknesses and requested clarifications:

      (1) It is unclear whether the effects described in this paper reflect a sensitivity to spatial frequency, i.e. in cycles/ deg (depends on the distance from the observer and changes when rescaling the image), or is a sensitivity to cycles /image, largely independent of image scale. How is it related to the well-documented size tolerance of IT neuron selectivity?

      Our stimuli are filtered using cycles/images and knowing the distance of the subject to the monitor, we can calculate the cycles/degrees. To the best of our knowledge, this is also the case for all other SF-related studies. To find the relation of observations to the cycles/image and degree/image, one should keep one of them fixed while changing the other, for example changing the subject's distance to the monitor will change the SF content in terms of cycle/degree. With our current data, we cannot discriminate this effect. To address this issue, we added the following description to the discussion. To address this issue, we added the following description to the discussion:

      “Finally, since our experiment maintains a fixed SF content in terms of both cycles per degree and cycles per image, further experiments are needed to discern whether our observations reflect sensitivity to cycles per degree or cycles per image.”

      (2) The authors' band-pass filtered phase scrambled images of faces and objects. The original images likely differed in their spatial frequency amplitude spectrum and thus it is unclear whether the differing bands contained the same power for the different scrambled images. If not, this could have contributed to the frequency sensitivity of the neurons.

      After equalizing the luminance and contrast of the images, we equilized their power concerning SF and category. The powers were calculated using the sum of the absolute values of the Fourier transform of the images. The results of the ANOVA analysis across all stimuli indicate that various SF bands exhibit similar power (one-way ANOVA, p-value = 0.24). Additionally, a comparison of power between faces and objects in all SF bands (including intact), for both unscrambled and scrambled images, reveals no significant differences (p-value > 0.1). To clarify this point, we have incorporated the following information into the Methods section.

      “Finally, we equalized the stimulus power in all SF bands (intact, R-R5). The SF power among all conditions (all SF bands, face vs. non-face and unscrambled vs. scrambled) does not vary significantly (ANOVA, p-value > 0.1).”

      (3) How strong were the responses to the phase-scrambled images? Phase-scrambled images are expected to be rather ineffective stimuli for IT neurons. How can one extrapolate the effect of the spatial frequency band observed for ineffective stimuli to that for more effective stimuli, like objects or (for some neurons) faces? A distribution should be provided, of the net responses (in spikes/s) to the scrambled stimuli, and this for the early and late windows.

      The sample neuron in Figure 1c is chosen to be a good indicator of the recorded neurons. In the early response phase, the average firing rate to scrambled stimuli is 26.3 spikes/s which is significantly higher than the response in -50 to 50ms which is 23.4. In comparison, the mean response to intact face stimuli is 30.5 spikes/s, while object stimuli elicit an average response of 28.8 spikes/s. Moving to the late phase, T2, the responses to scrambled, face, and object stimuli are 19.5, 19.4, and 22.4 spikes/s, respectively. Moreover, when the classification accuracy for SF exceeds chance levels, it indicates a significant impact of SF bands on the IT response. This raises a direct question about the explicit coding for SF bands in the IT cortex observed for ineffective stimuli and how it relates to complex and effective stimuli, such as faces. To show the strength of neuron responses to the SF bands in scrambled images, We added Appendix 1 - Figure 2 and also added Appendix 1 - Figure 1, according to comment 4, which shows the average and std of the responses to all SF bands. The following description is added to the results section.

      “Considering the strength of responses to scrambled stimuli, the average firing rate in response to scrambled stimuli is 26.3 Hz, which is significantly higher than the response observed between -50 and 50 ms, where it is 23.4 Hz (p-value=3x10-5). In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. The distribution of neuron responses for scrambled, face, and non-face in T1 is illustrated in Appendix 1 - Figure 2.

      […]

      Moreover, the average firing rates of scrambled, face, and non-face stimuli are 19.5 Hz, 19.4 Hz, and 22.4 Hz, respectively. The distribution of neuron responses is illustrated in Appendix 1 Figure 2.”

      (4) The strength of the spatial frequency selectivity is unclear from the presented data. The authors provide the result of a classification analysis, but this is in normalized units so that the reader does not know the classification score in percent correct. Unnormalized data should be provided. Also, it would be informative to provide a summary plot of the spatial frequency selectivity in spikes/s, e.g. by ranking the spatial frequency bands for each neuron based on half of the trials and then plotting the average responses for the obtained ranks for the other half of the trials. Thus, the reader can appreciate the strength of the spatial frequency selectivity, considering trial-to-trial variability. Also, a plot should be provided of the mean response to the stimuli for the two analysis windows of Figure 2c and 2d in spikes/s so one can appreciate the mean response strengths and effect size (see above).

      The normalization of the classification result is just obtained by subtracting the chance level, which is 0.2, from the whole values. Therefore the values could still be interpreted in percent as we did in the results section. To make this clear, we removed the “a.u.” from the figure and we added the following description to the results section.

      “The accuracy value is normalized by subtracting the chance level (0.2).”

      Regarding the selectivity of the neuron, as suggested by your comment, we added a new figure in the appendix section, Appendix 1 - figure 2. This figure shows the strength of SF selectivity, considering trial-to-trial variability. The following description is added to the results section:

      “The strength of SF selectivity, considering the trial-to-trial variability is provided in Appendix 1 Figure 2, by ranking the SF bands for each neuron based on half of the trials and then plotting the average responses for the obtained ranks for the other half of the trials.”

      The firing rates of Figures 2c and 2d are normalized for better illustration since the variation in firing rates is high across neurons, as can be observed in Figure Appendix 1 - Figure 1. Since we seek trends in the response, the absolute values are not important (since the baseline firing rates of neurons are different), but the values relative to the baseline firing rate determine the trend. To address the mean response and the strength of the SF response, the following description is added to the results section.

      “Considering the strength of responses to scrambled stimuli, the average firing rate in response to scrambled stimuli is 26.3 Hz, which is significantly higher than the response observed between -50 and 50 ms, where it is 23.4 Hz (p-value=3x10-5). In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. The distribution of neuron responses for scrambled, face, and non-face in T1 is illustrated in Appendix 1 - Figure 2.

      […]

      Moreover, the average firing rates of scrambled, face, and non-face stimuli are 19.5 Hz, 19.4

      Hz, and 22.4 Hz, respectively. The distribution of neuron responses is illustrated in Appendix 1 Figure 2.”

      Furthermore, we added a figure, Appendix 1 - Figure 3, to illustrate the strength of SF selectivity in our profiles. The following is added to the results section:

      “To check the robustness of the profiles, considering the trial-to-trial variability, the strength of SF selectivity in each profile is provided in Appendix 1 - Figure 3, by forming the profile of each neuron based on half of the trials and then plotting the average SF responses with the other

      half of the trials.”

      (5) It is unclear why such brief stimulus durations were employed. Will the results be similar, in particular the preference for low spatial frequencies, for longer stimulus durations that are more similar to those encountered during natural vision?

      Please refer to the first comment of Reviewer 1.

      (6) The authors report that the spatial frequency band classification accuracy for the population of neurons is not much higher than that of the best neuron (line 151). How does this relate to the SNC analysis, which appears to suggest that many neurons contribute to the spatial frequency selectivity of the population in a non-redundant fashion? Also, the outcome of the analyses should be provided (such as SNC and decoding (e.g. Figure 1D)) in the original units instead of undefined arbitrary units.

      The population accuracy is approximately 5% higher than the best neuron. However, we have no reference to compare the effect size (the value is roughly similar for face vs object while the chance levels are different). However, as stated in Methods, SNC is calculated for two label modes (LSF and HSF) and it can not be directly compared to the best neuron accuracy. Regarding the unit of SNC, it can be interpreted directly to percent by multiplying by a factor of 100. We removed the “a.u.” to prevent misunderstanding and modified the results section for clearance.

      “… SNC score for SF (two labels, LSF (R1 and R2) vs. HSF (R4 and R5)) and category … (average SNC for SF=0.51\%±0.02 and category=0.1\%±0.04 …”

      (7) To me, the results of the analyses of Figure 3c,d, and Figure 4 appear to disagree. The latter figure shows no correlation between category and spatial frequency classification accuracies while Figure 3c,d shows the opposite.

      In Figure 3c,d, following what we observed in Figure 3a,b about the category coding capabilities in the population of neurons based on the profile of the single neurons, we tested a similar idea if the coding capability of single neurons in SF/category could predict the coding capability of population neurons in terms of category/SF. Therefore, both analyses investigate a relation between a characteristic of single neurons and the coding capability of a population of similar neurons. On the other hand, in Figure 4, the idea is to check the characteristics of the coding mechanisms behind SF and category coding. In Figure 4a, we check if there exists any relation between category and SF coding capability within a single neuron activity without the impact of other neurons, to investigate the idea that SF coding may be a byproduct of an object recognition mechanism. In Figure 4b, we investigated the contribution of all neurons in population decision, again to check whether the mechanisms behind the SF and category coding are the same or not. This analysis shows how individual neurons contribute to SF or category coding at the population level. Therefore, the experiments in Figures 3 and 4 are different in the analysis method and what they were designed to investigate and we cannot directly compare the results.

      (8) If I understand correctly, the "main" test included scrambled versions of each of the "responsive" images selected based on the preceding test. Each stimulus was presented 15 times (once in each of the 15 blocks). The LDA classifier was trained to predict the 5 spatial frequency band labels and they used 70% of the trials to train the classifier. Were the trained and tested trials stratified with respect to the different scrambled images? Also, LDA assumes a normal distribution. Was this the case, especially because of the mixture of repetitions of the same scrambled stimulus and different scrambled stimuli?

      In response to your inquiry regarding the stratification of trials, both the training and testing data were representative of the entire spectrum of scrambled images used in our experiment. To address your concern about the assumption of a normal distribution, especially given the mixture of repetitions of the same scrambled stimulus and different stimuli, our analysis of firing rates reveals a slightly left-skewed normal distribution. While there is a deviation from a perfectly normal distribution, we are confident that this skewness does not compromise the robustness of the LDA classifier.

      (9) The LDA classifiers for spatial frequency band (5 labels) and category (2 labels) have different chance and performance levels. Was this taken into account when comparing the SNC between these two classifiers? Details and SNC values should be provided in the original (percent difference) instead of arbitrary units in Figure 5a. Without such details, the results are impossible to evaluate.

      For both SNC and CMI calculations in SF, we considered two labels of HSF (R4 and R5) and LSF (R1 and R2). This was mentioned in the Methods section, after equation (5). According to your comment, to make it clear in the results section, we also added this description to the results section.

      “… illustrates the SNC score for SF (two labels, LSF (R1 and R2) vs. HSF (R4 and R5)) and category (face vs. non-face) … conditioned on the label, SF (LSF (R1 and R2) vs. HSF (R4 and R5)) or category, to assess the information.”

      The value of SNC can also be directly converted to the percent by a factor of 100. To make it clear, we removed “a.u.” from the y-axis.

      (10) Recording locations should be described in IT, since the latter is a large region. Did their recordings include the STS? A/P and M/L coordinate ranges of recorded neurons?

      We appreciate your suggestion for the recording location. Nevertheless, given the complexities associated with neurophysiological recordings and the limitations imposed by our methodologies, we face challenges in precisely localizing every unit if they are located in STS or not. To address your comment, We added Appendix 1 - Figure 5 which shows the SF and category coding capability of neurons along their recorded locations.

      (11) The authors should show in Supplementary Figures the main data for each of the two animals, to ensure the reader that both monkeys showed similar trends.

      We added Appendix 2 which shows the consistency of the main results in the two monkeys.

      (12) The authors found that the deep nets encoded better the spatial frequency bands than the IT units. However, IT units have trial-to-trial response variability and CNN units do not. Did they consider this when comparing IT and CNN classification performance? Also, the number of features differs between IT and CNN units. To me, comparing IT and CNN classification performances is like comparing apples and oranges.

      Deep convolutional neural networks are currently considered the state-of-the-art models of the primate visual pathway. However, as you mentioned and based on our results, they do not yet capture various complexities of the visual ventral stream. Yet studying the similarities and differences between CNN and brain regions, such as the IT cortex, is an active area of research, such as:

      a. Kubilius, Jonas, et al. "Brain-like object recognition with high-performing shallow recurrent ANNs." Advances in neural information processing systems 32 (2019).

      b. Xu, Yaoda, and Maryam Vaziri-Pashkam. "Limits to visual representational correspondence between convolutional neural networks and the human brain." Nature Communications, 12.1 (2021).

      c. Jacob, Georgin, et al. "Qualitative similarities and differences in visual object representations between brains and deep networks." Nature Communications, 12.1 (2021).

      Therefore, we believe comparing IT and CNN, despite all of the differences in terms of their characteristics, can help both fields grow faster, especially in introducing brain-inspired networks.

      (13) The authors should define the separability index in their paper. Since it is the main index to show a relationship between category and spatial frequency tuning, it should be described in detail. Also, results should be provided in the original units instead of undefined arbitrary units. The tuning profiles in Figure 3A should be in spikes/s. Also, it was unclear to me whether the classification of the neurons into the different tuning profiles was based on an ANOVA assessing per neuron whether the effect of the spatial frequency band was significant (as should be done).

      Based on your comment, we added the description of the separability index to the methods section. However, since the separability index is defined as the division of two dispersion matrices, it has no units by nature. The tuning profiles in Figure 3a are normalized for better illustration since the variation in firing rates is high. Since we seek trends in the response, the absolute values are not important. Regarding the SF profile formation, to better present the SF profile assignment, we updated the method section. Furthermore, The strength of responses for scrambled stimuli can be observed in Appendix 1 - Figures 1 and 2.

      (14) As mentioned above, the separability analysis is the main one suggesting an association between category and spatial frequency tuning. However, they compute the separability of each category with respect to the scrambled images. Since faces are a rather homogeneous category I expect that IT neurons have on average a higher separability index for faces than for the more heterogeneous category of objects, at least for neurons responsive to faces and/or objects. The higher separability for faces of the two low- and high-pass spatial frequency neurons could reflect stronger overall responses for these two classes of neurons. Was this the case? This is a critical analysis since it is essential to assess whether it is category versus responsiveness that is associated with the spatial frequency tuning. Also, I do not believe that one can make a strong claim about category selectivity when only 6 faces and 3 objects (and 6 other, variable stimuli; 15 stimuli in total) are employed to assess the responses for these categories (see next main comment). This and the above control analysis can affect the main conclusion and title of the paper.

      We appreciate your concern regarding category selectivity or responsiveness of the SF profiles. First, we note that we used SI since it overcomes the limitations of the accuracy and recall metrics as they are discrete and can be saturated. Using SI, we cannot directly calculate face vs object with SI, since this index only reports one value for the whole discrimination task. Therefore, we have to calculate the SI for face/object vs scrambled to obtain a value per category. However, as you suggested, it raises the question of whether we assess how well the neural responses distinguish between actual images (faces or objects) and their scrambled versions or if we just assess the responsiveness. Based on Figure 3b, since we have face-selective (LSF and HSF preferred profiles), object-selective (inverse U), and the U profile, where SI is the same for both face and object, we believe the SF profile is associated with the category selectivity, otherwise we would have the same face/object recall in all profiles, as we have in the U shape profile.

      To analyze this issue further, we calculated the number of face/object selective neurons in 70-170ms. We found 43 face-selective neurons and 36 object-selective neurons (FDR corrected p-value < 0.05). Therefore, the number of face-selective and object-selective neurons is similar. Next, we check the selectivity of the neurons within each profile. Number of face/object selective neurons is LP=13/3, HP=6/2, IU=3/9, U=14/13, and the remaining belong to the NP group. Results show higher face-selective neurons in LP and HP and a higher number of object-selective neurons in the IU class. The U class contains roughly the same number of face and object-selective neurons. This observation supports the relationship between category selectivity and profiles.

      Next, we examined the average neuron response to the face and object in each profile. The difference between the firing rate of the face and object in none of the profiles was significant (Ranksum with a significance level of 0.05). However, the rates are as follows. The average firing rate (spikes/s) of face/object is LP=36.72/28.77, HP=28.55/25.52, IU=21.55/27.25, U=38.48/36.28. While the differences are not significant, they support the relationship between profiles and categories instead of responsiveness.

      The following description is added to the results section to cover this point of view.

      “To assess whether the SF profiles distinguish category selectivity or merely evaluate the neuron's responsiveness, we quantified the number of face/non-face selective neurons in the 70-170ms time window. Our analysis shows a total of 43 face-selective neurons and 36 non-face-selective neurons (FDR-corrected p-value < 0.05). The results indicate a higher proportion of face-selective neurons in LP and HP, while a greater number of non-face-selective neurons is observed in the IU category (number of face/non-face selective neurons: LP=13/3, HP=6/2, IU=3/9). The U category exhibits a roughly equal distribution of face and non-face-selective neurons (U=14/13). This finding reinforces the connection between category selectivity and the identified profiles. We then analyzed the average neuron response to faces and non-faces within each profile. The difference between the firing rates for faces and non-faces in none of the profiles is significant (face/non-face average firing rate (Hz): LP=36.72/28.77, HP=28.55/25.52, IU=21.55/27.25, U=38.48/36.28, Ranksum with significance level of 0.05). Although the observed differences are not statistically significant, they provide support for the association between profiles and categories rather than mere responsiveness.”

      About the low number of stimuli, please check the next comment.

      (15) For the category decoding, the authors employed intact, unscrambled stimuli. Were these from the main test? If yes, then I am concerned that this represents a too small number of stimuli to assess category selectivity. Only 9 fixed + 6 variable stimuli = 15 were in the main test. How many faces/ objects on average? Was the number of stimuli per category equated for the classification? When possible use the data of the preceding selectivity test which has many more stimuli to compute the category selectivity.

      We used only the main phase recorded data, which contains 15 images in each session. Each image results in 12 stimuli (intact, R1-R5, and phase-scrambled version). Thus, there exists a total of 180 unique stimuli in each session. Increasing the number of images would have increased the recording time. We compensated for this limitation by increasing the diversity of images in each session by picking the most responsive ones from the selectivity phase. On average, 7.54 of the stimuli were face in each session. We added this information to the Methods section. Furthermore, as mentioned in the discussion, for each classification run, the number of samples per category is equalized. We note that we cannot use the selectivity data for analysis, since the SF-related stimuli are filtered in different bands.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I suggest that the authors double-check their results by performing control experiments with longer stimulus duration and SF-spectrum-matched face and object stimuli.

      Thanks for your suggestion, according to your comment, we added Appendix 1 - Figure 3.

      In addition, I had a very difficult time understanding the differences between Figure 3c and Figure 4a. Please rewrite the descriptions to clarify.

      Thanks for your suggestion, we tried to revise the description of these two figures. The following description is added to the results section for Figure 3c.

      “Next, to examine the relation between the SF (category) coding capacity of the single neurons and the category (SF) coding capability of the population level, we calculated the correlation between coding performance at the population level and the coding performance of single neurons within that population (Figure 3 c and d). In other words, we investigated the relation between single and population levels of coding capabilities between SF and category. The SF (or category) coding performance of a sub-population of 20 neurons that have roughly the same single-level coding capability of the category (or SF) is examined.”

      Lines 147-148: The text states that 'The maximum accuracy of a single neuron was 19.08% higher than the chance level'. However, in Figure 4, the decoding accuracies of individual neurons for category and SF range were between 49%-90% and 20%-40%, respectively.

      Please explain the discrepancies.

      The first number is reported according to chance level which is 20%, thus the unnormalized number is 39% which is consistent with the SF accuracy in Figure 4. We added the following description to prevent any misunderstanding.

      “… was 19.08\% higher than the chance level (unnormalized accuracy is 49.08\%, neuron \#193, M2).”

      Lines 264-265: Should 'the alternative for R3 and R4' be 'the alternative for R4 and R5'?

      Thanks for your attention, it's “R4 and R5”. We corrected that mistake.

      Lines 551-562: The labels for SF classification are R1-R5. Is it a binary or a multi-classification task?

      It’s a multi-label classification. We made it clear in the text.

      “… labels were SF bands (R1, R2, ..., R5, multi-label classifier).”

      Figure 4b: Neurons in SF/category decoding exhibit both positive and negative weights. However, in the analysis of sparse neuron weights in Equation 6, only the magnitude of the weights is considered. Is the sign of weight considered too?

      We used the absolute value of the neuron weight to calculate sparseness. We also corrected Equation 6.

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 52: what do the authors mean by coordinate processing in object recognition?

      To avoid any potential misunderstanding, we used the exact phrase in Saneyoshi and Michimata (2015). It is in fact, coordinate relations processing. Coordinate relations specify the metric information of the relative locations of objects.

      (2) About half of the Introduction is a summary of the Results. This can be shortened.

      Thanks for your suggestion.

      (3) Line 134: Peristimulus time histogram instead of Prestimulus time histogram.

      Thanks for your attention. We corrected that.

      (4) Line 162: the authors state that R1 is decoded faster than R5, but the reported statistic is only for R1 versus R2.

      It was a typo, the p-value is only reported for R1 and R5.

      (5) Line 576: which test was used for the asses the statistical significance?

      The test is Wilcoxon signed-rank. We added it to the text.

      (6) How can one present a 35 ms long stimulus with a 60 Hz frame rate (the stimuli were presented on a 60Hz monitor (line 470))? Please correct.

      Thanks for your attention. We corrected that. The time of stimulus presentation is 33ms and the monitor rate is 120Hz.

    1. Author response:

      The following is the authors’ response to the original reviews.

      These are valuable findings that support a link between low-dimensional brain network organization, patterns of ongoing thought, and trait-level personality factors, making it relevant for researchers in the field of spontaneous cognition, personality, and neuropsychiatry. While this link is not entirely new, the paper brings to bear a rich dataset and a well-conducted study, to approach this question in a novel way. The evidence in support of the findings is convincing.

      We thank the reviewers and editors for their time, feedback, and recommendations for improvement. We have revised the manuscript with those recommendations in mind and provide a point-by-point description of the revisions below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors ran an explorative analysis in order to describe how a "tri-partite" brain network model could describe the combination of resting fMRI data and individual characteristics. They utilized previously obtained fMRI data across four scanning runs in 144 individuals. At the end of each run, participants rated their patterns of thinking on 12 statements (short multi-dimensional experience sampling-MDES) using a 0-100% visual analog scale. Also, 71 personality traits were obtained on 21 questionnaires. The authors ran two separate principal component analyses (PCA) to obtain low dimensional summaries of the two individual characteristics (personality traits from questionnaires, and thought patterns from MDES). The dimensionality reduction of the fMRI data was done by means of gradient analysis, which was combined with Neurosynth decoding to visualize the functional axis of the gradients. To test the reliability of thought components across scanning time, intra-class correlation coefficients (ICC) were calculated for the thought patterns, and discriminability indices were calculated for whole gradients. The relationship between individual differences in traits, thoughts, and macro-scale gradients was tested with multivariate regression.

      The authors found: a) reliability of thought components across the one hour of scanning, b) Gradient 1 differentiated between visual regions and DMN, Gradient 2 dissociated somatomotor from visual cortices, Gradient 3 differentiated the DMN from the fronto-parietal system, c) the associations between traits/thought patterns and brain gradients revealed significant effects of "introversion" and "specific internal" thought: "Introversion" was associated with variant parcels on the three gradients, with most of parcels belonging to the VAN and then to the DMN; and "Specific internal thought" was associated with variant parcels on the three gradients with most of parcels belonging to the DAN and then the visual. The authors conclude that interactions between attention systems and the DMN are important influences on ongoing thought at rest.

      Strengths:

      The study's strength lies in its attempt to combine brain activity with individual characteristics using state-of-the-art methodologies.

      Weaknesses:

      The study protocol in its current form restricts replicability. This is largely due to missing information on the MRI protocol and data preprocessing. The article refers the reader to the work of Mendes et al 2019 which is said to provide this information, but the paper should rather stand alone with all this crucial material mentioned here, as well. Also, effect sizes are provided only for the multiple multivariate regression of the inter-class correlations, which makes it difficult to appreciate the power of the other obtained results. 

      Thank you for these comments. We have addressed both issues by adding effect sizes for reported trait and thought related effects within the results table (Table 3, Line 427) and providing more information about the fMRI protocol and preprocessing steps.  (Lines 162- 188)

      Reviewer #2 (Public Review):

      The authors set out to draw further links between neural patterns observed at "rest" during fMRI, with their related thought content and personality traits. More specifically, they approached this with a "tri-partite network" view in mind, whereby the ventral attention network (VAN), the dorsal attention network (DAN), and the default mode network (DMN) are proposed to play a special role in ongoing conscious thought. They used a gradients approach to determine the low dimensional organisation of these networks. In concert, using PCA they reduced thought patterns captured at four time points during the scan, as well as traits captured from a large battery of questionnaires.

      The main findings were that specific thought and trait components were related to variations in the organisation of the tri-partite networks, with respect to cortical gradients.

      Strengths of the methods/results: Having a long (1 hr) resting state MRI session, which could be broken down into four separate scanning/sampling components is a strength. Importantly, the authors could show (via intra-class correlation coefficients) the similarity of thoughts and connectivity gradients across the entire session. Not only did this approach increase the richness of the data available to them, it speaks in an interesting way to the stability of these measures. The inclusion of both thought patterns during scanning along with trait-level dispositional factors is most certainly a strength, as many studies will often include either/or of these, rather than trying to reconcile across. Of the two main findings, the finding that detailed self-generated thought was associated with a decoupling of regions of DAN from regions in DMN was particularly compelling, in light of mounting literature from several fields that support this.

      Weaknesses of the methods/results: Considering the richness of the thought and personality data, I was a little surprised that only two main findings emerged (i.e., a relationship with trait introversion, and a relationship with the "specific internal" thought pattern). I wondered whether, at least in part and in relation to traits, this might stem from the large and varied set of questionnaires used to discern the traits. These questionnaires mostly comprised personality/mood, but some sampled things that do not fall into that category (e.g., musicality, internet addition, sleep), and some related directly to spontaneous thought properties (e.g., mind wandering, musical imagery). It would be interesting to see what relationships would emerge by being more selective in the traits measured, and in the tools to measure them.

      We agree that being more selective in trait measures and measuring tools could lead to more insights into trait – brain relationships. In part the emergence of only two main findings could also be a trade-off of multiple comparison corrections inherent in our current approach (i.e. 400 separate models for all parcels). Furthermore, we have adjusted the text in the discussion in this revision to highlight that more targeted measures of personality (e.g. self-consciousness) could provide a more nuanced view of the relationship between traits and patterns of thought at rest. (Line 532):

      “In the future it may also be important to consider measures of traits that could have relationships to both neural activity and or experience at rest (e.g. self-consciousness de Caso et al., 2017, or autistic tendencies, Turnbull et al., 2020a).”  

      Taken together, the main findings are interesting enough. However, the real significance of this work, and its impact, lie in the richness of the approach: combing across fMRI, spontaneous thought, and trait-level factors. Triangulating these data has important potential for furthering our understanding of brain-behaviour relationship across different levels of organisation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Recommendations for improving the writing and presentation.

      - Frame the study objectives more clearly. If it's about which theoretical framework best supports the data, you might need to advocate on why the tri-partite approach is a more efficient framework than others. If not, the argument will beg the question: you will find an effect on this model, so you will claim that this is an informative model. For example, if the focus is on these three RSNs and thought reporting, the authors might want to contextualize it historically, like how from two networks (DMN-antagonistic; Vanhaudenhuyse JCognNeurosci 2012; Demertzi et al, NetwNeuroci 2022) we end up to three and why this is a more suitable approach. What about whole-brain connectomic approaches, such as the work by Amico et al? 

      We have expanded on the objectives and rationale of the study by editing/ expanding the introduction as follows (Lines 84-87): 

      “Traditionally, it was argued that the DMN was thought to have an antagonistic relationship with systems linked to external processing (Fox et al., 2005). However, according to the ‘tri-partite’ network accounts the relationship between the DMN and other brain systems is more nuanced. From this perspective key hubs of the ventral attention network, such as the anterior insula and dorso-lateral prefrontal cortex, help gate access to conscious experience, influencing regardless of the focus of attention. This is hypothesised to occur because the VAN influences interactions between the DAN, which is more important for external mental content (Corbetta and Shulman, 2002), and the DMN which is important when states (including tasks) rely more on internal representations (Smallwood et al., 2021a)..”  (… and Lines 112:125):

      “Our current study explored whether this “tri-partite network” view of ongoing conscious thought derived from studies focused on understanding conscious experience, provides a useful organizing framework for understanding the relation between observed brain activity at rest and patterns of cognition/ personality traits. Such analysis is important because at rest there are multiple features of brain activity that can be identified via complex analyses that include regions that show patterns of coactivation (which are traditionally viewed as forming a cohesive network, (Biswal et al., 1995) as well as patterns of anti-correlation with other regions (e.g. Fox et al., 2005). However, it is unclear which of these relationships reflect aspects of cognition or behaviour or are in fact aspects of the functional organization of the cortex (Fox and Raichle, 2007). Consequently, our study builds on foundational work (e.g. Vanhaudenhuyse et al., 2011) in order to better understand which aspects of neural function observed at rest are mostly likely linked to cognition and behaviour. With this aim in mind, we examined links between macro-scale neural activation and both (i) trait descriptions of individuals and (ii) patterns of ongoing thought.”

      - As there was no explicit description of the adopted design and the fMRI procedure, I deduced that it was about a within-subject design, 1-hour scanning session, comprised of four runs, each lasting 15 min, can that be correct? In any case, an explicit description of the design and the fMRI procedure eases the reading and replicability. 

      Thank you for pointing this out. We have now restructured and edited the text relating to write those details clearly and explain the MDES part of the procedure in the same section. It now reads (Lines 162:167): 

      “Resting state fMRI with Multidimensional Experience Sampling (MDES)

      The current sample includes one hour of fully pre-processed rs-fMRI data from 144 participants (four scans from 135 participants, and three scans from nine participants whose data were missing or incomplete). The rs-fMRI was performed in four adjacent 15-minute sessions each immediately followed by MDES which retrospectively measured various dimensions of spontaneous thought during the scan.”

      - Was there a control to the analysis, such as a gradient which also associated with these characteristics? Anything else?

      In our analyses we explore multiple gradients and how they link to traits and thoughts at rest. While there is no explicit control, each analyses provides a constraint on the interpretation of the other analyses. We have added the following text to expand on this point (Line 372): 

      “To this end, we performed a multiple multivariate regression with thoughts, traits, and nuisance variables (motion, age and gender) as independent variables, with whole brain functional organisation, as captured by the first three gradients, as dependent variables. In this analytic approach relationships between cognition along one gradient but not along another help identify which relationships between brain systems are mostly likely to relate to the feature of cognition in question (i.e. each gradient acts as a control for the other).”  

      - I feel that Table 1 (list of tests) carries less information compared to Supplementary Table 1 (how spontaneous thought was reported and scored). I would suggest swapping them, unless Table 1 further contains which outcome measures per test were used for the analysis.  

      Thank you for this suggestion. Table showing the MDES questions has now been moved to the main text (Table 1, Line 194). However, as there is no other description of the questionnaires included in the main text, we have also retained the table listing personality/ trait questionnaires (Table 2, Line 200).

      - Ten group-level gradients were calculated out of which three were shown on the basis of previous work. Please, visualize all 10 gradients as complementary material to inform potential future works on how these look.  

      Thank you for this suggestion. Supplementary figure 3 now shows all 10 gradients.

      - Please provide more information on preprocessing, especially with motion artifacts and how the global signal was processed.  

      Thank you for pointing this out. We have now included the following text, summarized from Mendes et al., 2019, to describe the preprocessing in brief (Line 171:188): 

      “Motion correction parameters were derived by rigid-body realignment of the timeseries to the first (after discarding the first five volumes) volume with FSL MCFLIRT (Jenkinson et al., 2002). Parameters for distortion correction were calculated by rigidly registering a temporal mean image of this time series to the fieldmap magnitude image using FSL FLIRT (Jenkinson and Smith, 2001) which was then unwarped using FSL FUGUE (Jenkinson et al., 2012). Transformation parameters were derived by coregistering the unwarped temporal mean to the subject’s structural scan using FreeSurfer’s boundary-based registration algorithm (Greve and Fischl, 2009). All three spatial transformations were then combined and applied to each volume of the original time series in a single interpolation step. The time series was residualised against the six motion parameters, their first derivatives, “outliers” identified by Nipype’s rapidart algorithm (https://nipype.readthedocs.io/en/latest/interfaces/ A CompCor (Behzadi et al., 2007) approach was implemented to remove physiological noise from the residual time-series- which included first six principal components from all the voxels identified as white-matter cerebrospinal fluid. The denoised time series were temporally filtered to a frequency range between 0.01 and 0.1 Hz using FSL, mean centered and variance normalized using Nitime (Rokem et al., 2009). Imaging and pre-processing protocols are described in detail in Mendes et al (Mendes et al., 2019).”

      - Please, describe the duration of the whole process, and when the questionnaire data were collected.

      We apologize for the lack of clarity. “Data” section of the Methods has now been edited to explain this more clearly, it now reads (Line 146:154):

      “The dataset used here is part of the MPI-Leipzig Mind-Brain-Body (MPILMBB) database (Mendes et al., 2019). The complete dataset consists of a battery of selfreported personality measures, measures of spontaneous thought, task data, and structural and resting-state functional MRI (one hour, divided into four adjacent 15-min sessions) from participants between 20 and 75 years of age. Data were collected over a period of five days, with the MRI sessions always falling on day 3. The questionnaires were completed by participants before and after this day, using Limesurvey (https://www.limesurvey.org: version 2.00+) at their own convenience and using penand-paper on-site. A detailed description of the participants, measures, and data acquisition protocol has been previously published along with the dataset (Mendes et al., 2019).”

      - In light of the discussion about sample sizes and the power of the correlations, can you indicate the effect sizes of the reported results?  

      Thank you for pointing this out. Effect sizes have been added to the results table (Table 3, Line 427)

      Minor corrections to the text and figures

      - Introduction: "Our sample was a cohort....states were explanatory variables": Better move this part to Methods. Ideally, provide the hypotheses here, the ways you wanted to test them, and how you would negate them. What would it mean that you got the hypotheses confirmed? What would the opposite outcome mean? 

      We have added the following text before this part to clarify expand on the objective of the study (Lines 112:125): 

      “Our current study explored whether this “tri-partite network” view of ongoing conscious thought derived from studies focused on understanding conscious experience, provides a useful organising framework for understanding the relation between observed brain activity at rest and patterns of cognition/ personality traits. Such analysis is important because at rest there are multiple features of brain activity that can be identified via complex analyses that include regions that show patterns of coactivation (which are traditionally viewed as forming a cohesive network, (Biswal et al., 1995) as well as patterns of anti-correlation with other regions (e.g. Fox et al., 2005). However, it is unclear which of these relationships reflect aspects of cognition or behaviour or are in fact aspects of the functional organisation of the cortex (Fox and Raichle, 2007). Consequently, our study builds on foundational work (e.g. Vanhaudenhuyse et al., 2011) in order to better understand which aspects of neural function observed at rest are mostly likely linked to cognition and behaviour. With this aim in mind, we examined links between macro-scale neural activation and both (i) trait descriptions of individuals and (ii) patterns of ongoing thought.”   

      We have refrained from listing hypothesis, as the analyses we performed were data driven rather than hypothesis driven to include all possible associations between largescale connectivity patterns and individual state and trail level differences in personality and thought/ experience. We hope that the added text provides more context to understand this rationale.  

      - Please, clarify whether "conscious thought" means "reportable. 

      Thank you for this suggestion. We have now edited the first reference to thought patterns in the discussions to read “self-reports of ongoing thought”, instead of just “ongoing thought” (Line 432)

      - Please, clarify whether "experience" and "thought" are used interchangeably. This is because experience can also be ineffable, beyond thought reporting. 

      To clarify this in the context of the current study, we have edited first reference to “ongoing experience” in the introduction to “self-reports of ongoing experience”. (Line 75)

      - To ease reading comprehension for each Figure, communicate the main findings first, before describing the figures. 

      We believe this lack of clarity is caused by including the figure reference in the heading of the results subsections. We hope this issue is fixed by editing the text in the following manner (Line 381):

      “Trait Introversion 

      Along the first gradient, a parcel within the right orbitofrontal cortex (within the executive control network, shown in orange) showed more similarity with transmodal regions for individuals high on introversion. Six parcels within the ventral attention network, including anterior insula, operculum and cingulate cortex were closer to the somatomotor end along gradient two (shown in purple). The same regions showed lower scores along the third gradient in participants with higher introversion scores, indicating stronger integration with the default mode network. A parcel within posterior cingulate cortex (control) was also more segregated from the visual end of gradient two in participants with higher introversion scores. Associations between trait “introversion” and brain-wide activity are shown in Figure 4.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In "Prediction error determines how memories are organized in the brain: a study of Pavlovian fear 2 extinction in rats", Kennedy et al examine how new information is organized in memory. They tested an idea based on latent theory that suggests that a large prediction error leads to the formation of a new memory, whereas a small prediction error leads to memory updating. They directly tested the prediction by extinguishing fear-conditioned rats with gradual extinction. For their experiment, gradual extinction was carried out by progressively reducing the intensity of shocks that were co-terminated with the CS, until the CS was presented alone. Doing so resulted in diminished spontaneous recovery and reinstatement compared to Standard Extinction. The results are compelling, and have important implications for the field of fear learning and memory as well as translation to anxiety-related disorders.

      The authors carried out the Spontaneous Recovery experiment in 2 separate experiments. In one, they found differences between the Gradual and Standard Extinction groups, but in the second, they did not. It seems that their reinstatement test was more robust, and showed significant differences between the Gradual and Standard Extinction groups.

      The authors carried out important controls that enable proper contextualization of the findings. They included a "Home" group, in which rats received fear conditioning, but not extinction manipulation. Relative to this group, the Gradual and Standard extinction groups showed a reduction in freezing.

      In Experiments 3 and 4, the authors essentially carried out clever controls that served to examine whether shock devaluation (Experiment 4) and reduction in shock intensity (rather than a gradual decrease in shock intensity) (Experiment 3) would also yield a decrease in the return of fear. In line with a latent-cause updating explanation for accounting for the Gradual Extinction, they did not.

      In Experiment 5, the authors examined whether a prediction error produced by a change of context might contribute interference to the latent cause updating afforded by the Gradual Extinction. Such a prediction would align with a more flexible interpretation of a latent-cause model, such as those proposed by Redish (2007) and Gershman et al (2017), but not the latent-cause interpretation put forth by the Cochran-Cisler model (2019). Their findings showed that whereas Gradual Extinction carried out in the same context as acquisition resulted in less return of fear than Standard Extinction, it actually yielded a greater degree of return of fear when carried out in a different context, in support of the Redish and Gershman accounts, but not Cochran-Cisler.

      Experiment 6 extended the findings from Experiment 5 in a different state-splitting modality: timing. In this experiment, the authors tested whether a shift in temporal context also influenced the gradual extinction effect. They thus carried out the extinction sessions 21 days after conditioning. They found that while Gradual Extinction was indeed effective when carried out one day after fear conditioning, it did not when conducted 21 days later.

      The authors next carried out an omnibus analysis which included all the data from their 6 experiments, and found that overall, Gradual Extinction resulted in diminished return of fear relative to Standard Extinction. I thought the omnibus analysis was a great idea and an appropriate way to do their data justice.

      Strengths:

      Compelling findings. The data support the conclusions. 6 rigorous experiments were conducted which included clever controls. Data include male and female rats. I really liked the omnibus analysis.

      We thank the reviewer for their positive comments – they are appreciated.

      Weaknesses:

      None noted

      Reviewer #2 (Public Review):

      Summary:

      The present article describes a series of experiments examining how a gradual reduction in unconditional stimulus intensity facilitates fear reduction and reduces relapse (spontaneous recovery and reinstatement) relative to a standard extinction procedure. The experiments provide compelling, if somewhat inconsistent, evidence of this effect and couch the results in a scholarly discussion surrounding how mechanisms of prediction error contribute to this effect.

      Strengths:

      The experiments are theoretically motivated and hypothesis-driven, well-designed, and appropriately conducted and analyzed. The results are clear and appropriately contextualized into the broader relevant literature. Further, the results are compelling and ask fundamental questions regarding how to persistently weaken fear behavior, which has both strong theoretical and real-world implications. I found the 'scrambled' experiment especially important in determining the mechanism through which this reduction in shock intensity persistently weakens fear behavior.

      We thank the reviewer for their positive comments – they are appreciated.

      Weaknesses:

      Overall, I found very few weaknesses in this paper. I think some might view the somewhat inconsistent effects on relapse between experiments to be a substantial weakness, I appreciate the authors directly confronting this and using it as an opportunity to aggregate data to look at general trends. Further, while Experiment 1 only used males, this was corrected in the rest of the experiments and therefore is not a substantial concern.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript examined the role of large versus small prediction errors (PEs) in creating a state-based memory distinction between acquisition and extinction. The premise of the paper is based on theoretical claims and empirical findings that gradual changes between acquisition and extinction would lead to the potential overwriting of the acquisition memory with extinction, resulting in a more durable reduction in conditioned responding (i.e. more durable extinction effect). The paper tests the hypotheses in a series of elegant experiments in which the shock intensity is decreased across extinction sessions before non-reinforced CS presentations are given. Additional manipulations include context change, shock devaluation, and controlling for lower shock intensity exposure. The critical comparison was standard non-reinforced extinction training. The critical tests were done in spontaneous recovery and reinstatement.

      Strengths:

      The findings are of tremendous importance in understanding how memories can be updated and reveal a well-defined role of PE in this process. It is well-established that PE is critical for learning, so delineating how PE is critical for generating memory states and the role it serves in keeping memories dissociable (or not) is exciting and clever. As such the paper addresses a fundamental question in the field.

      The studies test clear and defined predictions derived from simulations of the state-belief model of Cochran & Cisler (2019). The designs are excellent: well-controlled and address the question.

      The authors have done an excellent job of explaining the value of the latent state models.

      The authors have studied both sexes in the study presented, providing generality across the sexes in their findings. However, depicting the individual data points in the bar graphs and noting which data represent males and which represent females would be of great value.

      We thank the reviewer for their positive comments. We have included individual data points in the bar graphs and indicated which represent males and females.

      Weaknesses:

      (1) While it seems obvious that delivering a lower intensity shock will generate a smaller PE than say no shock, it would have been nice to see data from say a compound testing procedure that confirms this.

      It would be great if we could provide independent evidence that shifting from a 0.8 mA shock to a 0.4 mA shock (first session of gradual extinction) produces a smaller prediction error than shifting from a 0.8 mA shock to no shock at all (first session of standard extinction). In theory, this could be assessed using Rescorla’s (2000) compound test procedure. However, application of this procedure requires the use of a within-subject design and latent state theories would not predict the gradual extinction effect in such a design (as all prediction errors generated in such a design would affect the state-splitting process). That is, the between-subject design used to generate the gradual extinction effect is not amenable to application of the compound test procedure; and the within-subject design in which the compound test procedure could be applied is unlikely to generate the gradual extinction effect. Thus, we instead rely on the high degree of similarity between our results and those predicted by Cochran & Cisler (2019) to argue that the gradual extinction protocol produces a series of smaller prediction errors than does the standard extinction protocol: hence the present pattern of results.

      (2) The devaluation experiment is quite clever, but it also would be strengthened if there was evidence in the paper that this procedure does indeed lead to shock devaluation.

      The aim of Experiment 3 was to determine whether the gradual extinction effect is due to prediction error-based memory updating or shock devaluation. If the effect was due to shock devaluation, the group that received the gradual extinction treatment should have displayed the same low level of spontaneous recovery as the group that only experienced the shock at its lowest (0.1 mA) intensity (i.e., the shock devaluation group). Contrary to this prediction, the results showed that the gradually extinguished group displayed less spontaneous recovery than the shock devaluation group. That is, in this experiment, the slow and progressive reduction in shock intensity was processed differently to the repeated 0.1 mA shock exposures but the results were inconsistent with any shock devaluation effect. Hence, we conclude that the gradual extinction effect does not involve shock devaluation but instead is due to prediction error-based memory updating.

      (3) It would have been very exciting to see even more parametric examinations of this idea, like maintaining shock intensity but gradually reducing shock duration, which would have increased the impact of the paper.

      We appreciate the reviewer’s point. As each shock was presented for just 0.5 s, we are not confident that rats would detect gradual and progressive changes in its duration in the same way as they can obviously detect gradual and progressive changes in its intensity. We are, however, investigating the effects of gradual extinction in a second order conditioning protocol, which will allow us to examine the full range of parameters that are important for its regulation, including manipulations of stimulus duration. In our second-order conditioning protocol, rats are first exposed to pairings of a 10 s S1 and a 0.5 s foot shock US; and then exposed to pairings of a 30 s S2 and the 10 s S1. Across the latter pairings, rats acquire second-order conditioned fear responses to S2. Importantly, these responses can be extinguished through repeated presentations of the S2 in the absence of its S1-associate; and the duration of the S1 can be progressively and gradually reduced from 10 s to 0 s across the shift to this extinction. These experiments are currently in progress and will eventually represent an extension of the present findings.

      (4) Individual data points should be represented in the test figures (see above also).

      We have updated the figures to show these data points.

      Rescorla, R. A. (2000). Associative changes in excitors and inhibitors differ when they are conditioned in compound. Journal of Experimental Psychology: Animal Behavior Processes26(4), 428.

      Reviewing Editor (Recommendations For The Authors):

      The eLife assessment relates to the present form of the paper. However, following a discussion with the reviewers, the significance of the findings could be bolstered to fundamental if you decided to revise the current manuscript by scaling up the investigation to examine a wider set of parameters and conditions under which error can influence state allocation of memories. One way of doing this, but not limited to this, is suggested in the reviews (e.g. maintaining shock intensity, reducing its duration). Relatedly, a more extensive discussion of the Gershamn et al. (2013) paper would be relevant.

      As noted in our response to Reviewer 3, we are currently investigating the effects of gradual extinction in a second order conditioning protocol, which will allow us to examine the full range of parameters that are important for its regulation, including manipulations of stimulus duration. These experiments are in-progress and will eventually represent an extension of the present findings. They are not, however, ready to be included as part of the present study.

      We have further referenced the Gershman et al., (2013) paper as well as the related Bouton et al., (2004) paper on the effects of gradually reducing the frequency of the US across extinction. This appears in the fifth paragraph of the Discussion: “The present study adds to a growing body of evidence that manipulations applied across the shift from CS-US pairings to presentations of the CS alone can influence the effectiveness of extinction. For example, Gershman et al., (2013) and Bouton et al., (2004) showed that gradually reducing the proportion of reinforced CS presentations results in less spontaneous recovery and slower reacquisition, respectively; though both studies left open fundamental questions about the basis of their findings (see also Woods & Bouton, 2007).”

      Reviewer #1 (Recommendations For The Authors):

      I don't have any strong recommendations. I think the paper is really great as is.

      One minor suggestion to consider:

      The authors carried out the Spontaneous Recovery experiment in 2 separate experiments. In one, they found differences between the Gradual and Standard Extinction groups, but in the second, they did not. This is perhaps not entirely surprising, since their extinction test was conducted 2 weeks post-extinction, and not all rats show spontaneous recovery within that timeframe. The authors mention that the lack of SR might be due to the low level of freezing reported in their test, but since they are showing group mean data, they might consider showing the individual data points to showcase the range of SR freezing as an additional way to make sense of the variability (ie, maybe a few rats that showed very low freezing carried the mean down in the Standard Extinction group, while others showed return of fear).

      We agree and have included individual data points for test results in Figures 2D, 2F, 3D, 3H, 4D and 4H. Hence, these figures now reflect both group and individual freezing levels.

      Reviewer #2 (Recommendations For The Authors):

      Overall, I thought this was an exceptional paper. Aside from the comments listed above which I'm not sure are inherently addressable, the only real changes I would like to see are that individual data points should be depicted in the main testing figures, as is becoming more conventional in the field.

      We thank the reviewer for their positive comments. As indicated in our response to the other reviewers, we have added individual data points to the histograms showing test results.

      Reviewer #3 (Recommendations For The Authors):

      Figures

      (1) The test data are presented as bars, but I did wonder if there were differences between the groups from the start of testing or if those emerged across testing (SR vs extinction savings).

      We have added two new figures to the supplementary section, Figures 8 and 9. These display the trial-by-trial data from spontaneous recovery and reinstatements tests in each experiment. The data clearly show that the between-group differences in freezing were very stable across the test sessions.

      (2) While I understand the importance of presenting the last extinction session, I felt depicting the entire CS session would be more informative. Alternatively, removing this altogether and leaving the information from the extinction session in the supplemental would focus the reader on the key test data.

      We appreciate the reviewer’s point. It is important to show that the groups displayed equivalent freezing in the final extinction session prior to testing. Given that the test data are conveniently and best presented in a histogram, we have chosen to present the data from the final extinction session in the same way. The full, trial-by-trial trajectory of freezing across conditioning and extinction, as well as the analyses of these data, are presented in the supplementary A.

      (3) I did not find the figures to be very aesthetically pleasing (in part because some panels were unnecessarily large). For example, I found it rather odd that the simulation panels were split in Figure 1. One suggestion of how this figure could look better is to keep the size of panels B, C, and D the same and align them on the same row with the design figure above them. The other option is to have the design figure above the test figure and the two simulation figures above each other and next to the design and test. Also, there are grey lines that appear around the simulation figures on my PDF.

      We have updated the figures so that they are consistent across experiments and more aesthetically pleasing. Specifically, we have consistently: 1) inserted the simulations of Cochran & Cisler (2019) next to the design schematic; 2) inserted the extinction and test data beneath the design schematic; and 3) Made the sizing of figures more uniform across Experiments 1-6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This study presents valuable findings as it shows that sleep rhythm formation and memory capabilities depend on a balanced and rich diet in fly larvae. The evidence supporting the claims of the authors is convincing with rigorous behavioral assays and state-of-the-art genetic manipulations. The work will be of interest to researchers working on sleep and memory. 

      Public Reviews: 

      Summary: 

      This manuscript investigates how energetic demands affect the sleep-wake cycle in Drosophila larvae. L2 stage larvae do not show sleep rhythm and long-term memory (LTM), however, L3 larvae do. The authors manipulate food content to provide insufficient nutrition, which leads to more feeding, no LTM, and no sleep even in older larvae. Similarly, activation of NPF neurons suppresses sleep rhythm. Furthermore, they try to induce a sleep-like state using pharmacology or genetic manipulations in L2 larvae, which can mimic some of the L3 behaviours. A key experimental finding is that activation of DN1a neurons activate the downstream DH44 neurons, as assayed by GCaMP calcium imaging. This occurs only in third instar and not in second instar, in keeping with the development of sleep-wake and feeding separation. The authors also show that glucose metabolic genes are required in Dh44 neurons to develop sleep rhythm and that DH44 neurons respond differently in malnutrition or younger larvae. 

      Strengths: 

      Previous studies from the same lab have shown the sleep is required for LTM formation in the larvae, and that this requires DN1a and DH44 neurons. The current work builds upon this observation and addresses in more detail when and how this might develop. The authors can show that low quality food exposure and enhanced feeding during larval stage of Drosophila affects the formation of sleep rhythm and long-term memory. This suggests that the development of sleep and LTM are only possible under well fed and balanced nutrition in fly larvae. Non-sleep larvae were fed in low sugar conditions and indeed, the authors also find glucose metabolic genes to be required for a proper sleep rhythm. The paper presents precise genetic manipulations of individual classes of neurons in fly larvae followed by careful behavioural analysis. The authors also combine thermogenetic or peptide bath application experiments with direct calcium imaging of specific neurons. 

      Weaknesses: 

      The authors tried to induce sleep in younger L2 larvae, however the behavioral results suggest that they were not able to induce proper sleep behaviour as in normal L3 larvae. Thus, they cannot show that sleep during L2 stage would be sufficient to form LTM. 

      We agree that the experiments with Gaboxadol feeding in L2 did not perfectly mimic L3 sleep behaviors. However, genetic induction of sleep in L2 was effective in increasing sleep duration and depth similar to that observed in normal L3. As noted below in response to specific reviewer comments, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the gaboxadol manipulation did cause a significant decrease in arousal threshold compared to control larvae. Together these approaches support the hypothesis that sleeping more/more deeply is not sufficient to promote LTM in L2.

      The authors suggest that larval Dh44 neurons may integrate "information about the nutritional environment through the direct sensing of glucose levels to modulate sleep-wake rhythm development". They identify glucose metabolism genes (e.g., Glut1) in the downstream DH44 neurons as being required for the organization of the sleep-wake-feeding rhythm, and that CCHa signaling in DN1a signaling to the DH44 cells via the receptor. However, how this is connected is not well explained. Do the authors think that the nutrient sensing is only occurring in the DH44 neurons and not in DN1a or other neurons? Would not knocking down glucose metabolism in any neuron lead to a functional defect? What is the evidence that Dh44 neurons are specific sensors of nutritional state? For example, do the authors think that e.g. the overexpression of Glut1 in Dh44 neurons, a manipulation that can increase transport of glucose into cells, would rescue the effects of low-sugar food? 

      We thank the reviewer for these suggestions and have added the experiment proposed. We found that knockdown of Hex-C in DN1a neurons did not disrupt sleep-wake rhythms (Fig. S4G-I) suggesting that Dh44 neurons are specialized in requiring glucose metabolism to drive sleep-wake rhythms. We have also added further clarification in the text regarding the existing evidence that Dh44 neurons act has nutrient sensors.

      Some of the genetic controls seem to be inconsistent suggesting some genetic background effects. In Figure 2B, npf-gal4 flies without the UAS show no significant circadian change in sleep duration, whereas UAS-TrpA flies do. The genetic control data in Figure 2D are also inconsistent. Npf-Gal4 seems to have some effect by itself without the UAS. The same is not seen with R76G11-Gal4. Suppl Fig 2: Naïve OCT and AM preference in L3 expressing various combinations of the transgenes show significant differences. npf-Gal4 alone seems to influence preference. 

      The sleep duration and bout number/length data are highly variable. 

      All experiments are performed in isogenized background so variability seen in genetic controls likely reflects stochastic nature of behavioral experiments. Indeed, adult sleep data also shows a great deal of variability within the same genetic background (PMID: 29228366). We agree it is an important point, and we attempt to minimize variability as much as possible with backcrossing of flies and tight control of environmental conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Low sugar exposure and activation of NPF neurons might not induce the same behavioral changes. LS exposure does not enhance mouth hook movements, but overall food intake. NPF activation seems to enhance mouth hook movements, but the data for food intake is not shown. This information would be necessary to compare the two different manipulations. 

      We thank the reviewer for this suggestion. However, we elected not to perform food intake experiments with the NPF activation experiments. Since we are not directly comparing the low sugar and NPF manipulations to each other, we think that both experiments together support the conclusion that immature food acquisition strategies (whether food intake or feeding rate) limit LTM performance. 

      The authors write that the larval feeding assays run for 4 hours, can they explain why that long? Larvae should already have processed food within 4 hours, so that the measurement would not include all eaten food.

      We clarified the rationale for doing 4 hour feeding assays in the results section. We did 4 hours on blue dyed food because initial experiments of 1 hour with control L3 at CT1-4 were difficult to interpret. The measurement does not include all of the eaten food in the 4 hours but does reflect more long-term changes in food intake.

      Sleep induction with Gaboxadol seems to not really work - sleep duration, bout number and length are not enhanced, and arousal threshold is only slightly lower. Thus, the authors should not use this data as an example for inducing sleep behaviour. 

      We agree this approach did not have a large effect in larvae. However, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the Gaboxadol manipulation did cause a mild (but significant) decrease in arousal threshold compared to control larvae. Gaboxadol feeding also caused a significant decrease in total body weight compared to control larvae indicating that even slightly deeper sleep could be detrimental to younger animals.

      Activation of R76G11 with TrpA1 seems to work better for inducing sleep like behaviour. However, the authors describe that they permanently activated neurons. To induce a "normal" sleep pattern, the authors might try to only activate these neurons during the normal enhanced sleep time in L3 (CT13?) and not during the whole day. This might also allow larvae to eat during day time and gain more weight. 

      We apologize that this point was not clearer, but we did do acute activation of R76G11(+) neurons, as proposed by the reviewer. We have clarified the text to make this point.

      It would be interesting to see how larvae fed with high sucrose and low protein diet would behave in this assay. Do the authors suggest that sugar is most important for the development of sleep behaviour or that it is a combination of sugar and protein that might be required? 

      We agree that feeding larvae a high sucrose and low protein diet would be interesting. However, we initially tried a low protein diet and observed significant developmental delays. Therefore, we are concerned that developmental defects on a high sucrose and low protein diet would confound behavioral results. Additionally, the Dh44 manipulations (glucose & GCN2 signaling) suggest that sugar is the most important for the development of sleep behaviors.

      Reviewer #3 (Recommendations For The Authors): 

      The authors could discuss if the interaction between DN1a clock neurons and Dh44 neurons is mediated synaptic or by volume transmission following the extracellular release of the CCHa1 neuropeptide. They write that "the development of Dh44 neuronal competency to receive clock-driven cues" and that "DN1a clock neurons anatomically and functionally connect to Dh44" but a discussion about volume vs. synaptic signalling would be of interest. 

      We thank the reviewer for this suggestion. We revised the discussion to address this point.

      line 223 " demonstrating that post-synaptic processes likely". It would be interesting to read a discussion on whether it is known if these are postsynaptic or peptide-mediated volume effects? 

      We added additional text to the discussion to address these points.

      - The authors may want to include a schematic of the circuit and how its position in the general anatomy of the fly larva. 

      We thank the reviewer for this suggestion. We have added a model figure to Fig. S6.

      "Dh44 neurons act through glucose metabolic genes" - consider rewording e.g. require glucose metabolic genes 

      We revised the text.

      - line 45 "Early in development, young animals must obtain enough nutrients to ensure proper growth" - this is too general, many animals do not feed in early life-cycle stages (e.g. lecitotrophic development), consider rewording 

      We revised the text to be more specific.

      - line 90 "however, L3 at CT1 consume more than L3 at CT12 (Figure S1A)" - typo CT13, also consider rewording to match the structure of the sentence before 'however, L3 consumed more at CT1 than at CT13' 

      We revised the text to fix this error.

      - Line 111 "and loss of deep sleep" - how is deep sleep defined and measured in the larvae? It is not clear from the data or the text. 

      We revised the text to define deep sleep in the results section. We also have a description of how arousal threshold is calculated in the methods.

      - In Figure 3B and G the individual data points are not shown 

      We did not show individual data points for those graphs because we are plotting the average percentage of 4 biological replicates.

      Typo: 

      Figure 1 legend "F, n= n=100-172 " 

      We revised the text to fix this typo.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript by Hussain and collaborators aims at deciphering the microtubule-dependent ribbon formation in zebrafish hair cells. By using confocal imaging, pharmacology tools, and zebrafish mutants, the group of Katie Kindt convincingly demonstrated that ribbon, the organelle that concentrates glutamate-filled vesicles at the hair cell synapse, originates from the fusion of precursors that move along the microtubule network. This study goes hand in hand with a complementary paper (Voorn et al.) showing similar results in mouse hair cells. 

      Strengths: 

      This study clearly tracked the dynamics of the microtubules, and those of the microtubule-associated ribbons and demonstrated fusion ribbon events. In addition, the authors have identified the critical role of kinesin Kif1aa in the fusion events. The results are compelling and the images and movies are magnificent. 

      Weaknesses: 

      The lack of functional data regarding the role of Kif1aa. Although it is difficult to probe and interpret the behavior of zebrafish after nocodazole treatment, I wonder whether deletion of kif1aa in hair cells may result in a functional deficit that could be easily tested in zebrafish? 

      We have examined functional deficits in kif1aa mutants in another paper David et al. 2024. In Submission, preprint available:  

      https://www.biorxiv.org/content/10.1101/2024.05.20.595037v1

      In addition to playing a role in ribbon fusions, Kif1aa is also responsible for enriching glutamate-filled secretory vesicles at the presynaptic active zone. In kif1aa mutants (and crispants), vesicles are no longer localized to the hair cell base, and there is a reduction in the number of vesicles associated with presynaptic ribbons. Kif1aa mutants also have functional defects including reductions in spontaneous vesicle release and evoked postsynaptic calcium responses. Behaviorally, kif1aa mutants exhibit impaired rheotaxis, indicating defects in the lateral-line system and an inability to accurately detect water flow.  Since our paper focuses on microtubule-associated ribbon movement and dynamics early in hair cell development, we have only discussed the effects of Kif1aa directly related to ribbon dynamics during this time window in this paper. In our revision, we will reference this recently submitted work.

      Impact: 

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study indicates that the formation of the synaptic organelle is a dynamic process involving the fusion of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion process. 

      Reviewer #2 (Public Review): 

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily-releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells are not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging, and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably a prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability are necessary for normal numbers of mature ribbons and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics. 

      Strengths: 

      (1) The manuscript offers a comprehensive Introduction and Discussion sections that will inform generalists and specialists. 

      (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With rigorous quantification and thoughtful analyses, the authors generate datasets often only obtained in cultured cells or more diminutive animal models (e.g., C. elegans). 

      (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor. 

      (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion? 

      These are important strengths and we do plan to investigate adaptors and how hair cell activity impacts ribbon fusion and transport in the future!

      Weaknesses: 

      (1) Neither the data or the Discussion address a direct or indirect link between Kinesins and ribbon bodies. Showing Kif1aa protein in proximity to the ribbon bodies would add strength.

      This is a great point, and we are working to create a transgenic line with fluorescently labelled Kif1aa to directly visualize its association with ribbons. At present, we have not obtained a transgenic line, and localization of Kif1aa and ribbons in live hair cells it is beyond the scope of this paper. In our revision we will discuss this caveat.

      (2) Neither the data or Discussion address the functional consequences of loss of Kif1aa or ribbon transport. Presumably, both manipulations would reduce afferent excitation.

      Excellent point. Please see the response above to Reviewer #1 weaknesses.  

      (3) It is unknown whether the drug treatments or genetic manipulations are specific to hair cells, so we can't know for certain whether any phenotypic defects are secondary. 

      This is correct and is a caveat of our Kif1aa and drug experiments. However, to mitigate this in the pharmacological experiments, we have done the drug treatments at 3 different timescales: long-term (overnight), short-term (4 hr) and fast (30 min) treatments. The faster experiment done after 30 min drug treatment is where we observe reduced directional motion and fusions. This later experiment should not be affected by any long-term changes or developmental defects that could be caused by the drugs as hair cell development occurs over 8-12 hrs. However, we acknowledge that these treatments and genetic experiments could have secondary phenotypic defects that are not hair-cell specific. In our revision, we will discuss these issues.

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that 

      (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone; 

      (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses. 

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots. 

      Strengths: 

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size are convincing and novel. 

      (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing the fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel. 

      (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting. 

      (4) The quality of the data is extremely high and the results are interesting. 

      Weaknesses: 

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under the control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein, and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect the regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example motors), which could influence mobility. 

      We agree that overexpression in transgenic lines is a common issue and would have loved to do these experiments with endogenously expressed fluorescent proteins under a native promoter. However, this was not technically possible for us. We originally characterized several transgenic Ribeye lines in the past to ensure they have normal ribbon numbers and size (myo6b:ribb-mcherry, myo6b:riba-tagRFP and myo6b:riba-GFP) - in 2014. Unfortunately, we no longer have the raw data from this analysis. In our revision, we will repeat our immunolabel on myo6b:riba-tagRFP transgenic fish and examine ribbon numbers and size and show what impact (or not) exogenous Ribeye expression has on ribbon formation.

      (2) The examples of punctae colocalizing with microtubules look clear (Figures 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified. 

      We attempted a co-localization study between microtubules and ribbons but decided not to move forward with it due to several issues:

      (1)  Hair cells have an extremely crowded environment, especially since the nucleus occupies the majority of the cell. All proteins are pushed together in the small space surrounding the nucleus and hence co-localization is not meaningful because the distances are so small.

      (2) We also attempted to segment microtubules in these images and quantify how many ribbons were associated with microtubules, but 3D microtubule segmentation was not accurate in these hair cells due to highly varying filament intensities, and diffuse cytoplasmic tubulin signal.

      Therefore, we decided that a better measure of ribbon-microtubule association would be a demonstration that individual ribbons keep their association with microtubules over time (in our time lapses), rather than a co-localization study. We see that ribbons localize to microtubules in all our timelapses, including the examples shown. We observed that if a ribbon dissociates, it is just to switch from one filament to another. We have not observed free-floating ribbons in our study.

      (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor-driven transport typically has an alpha approaching 2). Due to the randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in Figure 3H look as though they might be reasonably fit by a straight line (i.e. alpha = 1). 

      As we have stated in the paper, we only see a small subset of the ribbon precursors moving directionally. The majority of the ribbons are stationary. We cannot say for sure what is happening with the stationary ribbons, but our hypothesis is that these ribbons eventually exhibit directed motion. This idea is supported by the fact that we have seen ribbons that are stationary begin movement, and ribbons that are moving come to a stop during the acquisition of our timelapses. The ribbons that are stationary may not have enough motors attached, or they may be in a sort of ‘seeding’ phase where the ribeye protein could be condensing on the ribbon. We have discussed the possibility of ribbons being biomolecular condensates in our Discussion.

      In our revision we will discuss why ribbon transport does not resemble typical motor-driven transport (also see response to point 4 below). We will also reexamine our MSD data in more detail as suggested by Reviewer 3 and provide distributions of alpha values in our revision.

      (4) The "directed motion" shown here does not really resemble motor-driven transport observed in other systems (axonal transport, for example) even in the subset that has been picked out as examples here. While the role of microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting). 

      One major difference between axonal and ribbon transport is that microtubules are very stable and linear in axonal transport. Therefore, the directed motion observed is ‘canonical’. In hair cells, the microtubules are extremely dynamic, especially towards the hair cell base. Within a single time frame (60-100 s), we see the network changing (moving and branching). This dynamic network adds another layer of complexity onto the motion of the ribbon, as the filament track itself is changing. Therefore, we see a lot of stalling, filament switching, and reversals of ribbon movement in our movies. However, we have demonstrated in our movies as well as using MSD analysis, that a subset of ribbons exhibit directional motion. In our revision we will discuss why directed motion in hair cells does not resemble canonical motor-driven transport in axons.

      (5) The effect of acute treatment with nocodozole on microtubules in movie 7 and Figure 6 is not obvious to me and it is clear that whatever effect it has on microtubules is incomplete. 

      When using Nocodazole, it is important to optimize the concentration of the drug such that there is minimal cytotoxicity, while still being effective. Microtubules in the apical region of hair cells are very stable and do not respond well to Nocodazole treatment at concentrations that are tolerable to hair cells. While a few stable filaments remain largely at the cell apex, there are almost no filaments at the hair cell base, which is different from the wild-type hair cells. In addition, Nocodazole-treated hair cells have more cytoplasmic YFP-tubulin signal compared to wild type. We will add additional images and quantification in our revision to illustrate these points.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The model presented by the authors is consistent with the data described. Further testing of this model, for example by mutating the deep cholesterol binding site, would strengthen the model. However, such experiments might be challenging due to the relatively non-specific/hydrophobic nature of the deep cholesterol binding site.

      We completely agree that testing of the deep cholesterol-binding site by mutagenesis would be ideal. However, as the reviewer points out, such experiments would be challenging, not only because of the non-specific/hydrophobic nature of the deep cholesterol-binding site but also because we have been purifying AQP0 from natural sources (sheep eyes) and because it would be very difficult to secure the substantial amount of cryo-EM time needed to generate an electron crystallographic structure.

      Reviewer #2 (Public Review):

      The authors report that the findings generally apply to raft formation in membranes. However, this point is less clear as the lens membrane in which AQP0 resides is rather unique in lipid and protein content and density.

      We agree that the lens membrane is quite unique in its lipid and protein content and density, but rafts are also characterized by the same lipids and high protein density. Nonetheless, we do agree that our suggested implications for lipid rafts are speculative and so we emphasize this more in the revised version of the manuscript by writing: “This model is specific for the formation of AQP0 arrays in lens membranes, but we speculate that similar principles may underlie the organization of lipid rafts”.

      Reviewer #3 (Public Review):

      The authors showed that these adjacent tetramers can withstand a larger lateral detachment force when deep cholesterol molecules are present at the interface compared to scenarios with sphingomyelin (SM) molecules at the interface between two AQP0 tetramers. Authors interpret that result as evidence that deep cholesterol molecules mechanically stabilize the interface of the AQP0 tetramers. This conclusion has minor weaknesses, and the rigor of the lateral detachment simulations could be increased by establishing a reference point for the detachment force needed to separate AQP0 tetramers in a scenario without lipids at the interface between tetramers, and by increasing the number of repeats for the non-equilibrium steered MD simulations. Thermodynamic integration might be a better approach to compute the stabilization energy in the presence of cholesterol compared to the SM case.

      In all electron crystallographic structures of AQP0 determined to date, lipids have always been observed sandwiched in between the AQP0 tetramers (see, for example, Gonen et al., Nature, 2005 and Hite et al., EMBO J., 2010). Therefore, considering a scenario without lipids at the interface would be unnatural and the AQP0 array would likely not be stable. Such a scenario would thus not be the most appropriate reference point for the lateral detachment simulations. In our view, comparison of a scenario with the deep cholesterol at the interface versus a scenario without it appeared a more realistic setup to investigate the stabilizing role the deep cholesterol has on the association of AQP0 tetramers. In the Results subsection regarding these simulations, we added the following sentence to further stress the rationale of our experimental setup: “Comparison of these two cases should allow us to assess the effect of the deep-binding Chol3 molecules on the mechanical stability of the associated AQP0 tetramers.”

      Concerning the second suggestion of the reviewer of increasing the number of repeats, we doubled the number of simulation replicas: now it is n=20 for each pulling velocity and lipid interface. The trend of higher detachment forces for the interface containing cholesterol prevailed in a statistically significant, robust fashion (see Figure 7 of the revised manuscript and the main text referring to it). In consequence, as the reviewer suggested, extension of the dataset increased the rigor of the lateral detachment simulations. In addition to Figure 7 and the Results section, the Methods section and Table 4 have been updated to reflect the expanded dataset. 

      Finally, concerning the usage of thermodynamic integration to compute the stabilization energy, we agree with the reviewer that calculation of the free energy would be better to determine the thermodynamic stabilization imparted by the cholesterol molecules. At an earlier stage of the project, we did indeed consider carrying out this type of simulations, but we decided against it because of the complexity and poor convergence of such calculations. Our choice is also based on a previous attempt in which it proved very challenging to use free energy calculations to assess the binding of lipids to a flippase (see Wang et al. BioRxiv, https://doi.org/10.1101/ 2020.06.24.169771, 2021). We now included this consideration in the revised manuscript by adding the following sentence in the Discussion: “Although we provide solid evidence here that deep cholesterol impart mechanical stabilization, free energy calculations would be required to obtain the full picture of thermodynamic stabilization. Such free energy calculations are challenging for lipids, due to the chemical complexity and poor convergence involved (Wang et al., 2021), and are thus beyond the scope of the current work.”

      Reviewer #1 (Recommendations For The Authors):

      Reorganizing a few concepts would make the story easier to follow. For example, the analysis of the bilayer thickness seems disjointed. Although Figure 4 shows measurements, it is not clear that the measurements represent bilayer thickness until the last paragraph of page 21 in the discussion, where "Hydrophobic thickness" is first introduced. Moving that first paragraph of page 22 that refers to Fig. 4A to the results would be helpful to understand the figure, and would prepare the reader for this part of the discussion.

      In response to the reviewer, we moved the description of the measurements of the hydrophobic thickness to the Results section (Page 12) and adjusted the Discussion to minimize repetition (page 22).

      Likewise, Figure 4E shows measurements of something, but it is not clear that these are the dimensions of a protein pocket until well into the discussion.

      In response to the reviewer’s comment, we added a sentence both in the Results section [It sits in a pocket between the two adjacent AQP0 tetramers that is wider in the extracellular leaflet than the cytoplasmic leaflet (Figure 4E)] as well as to the caption of Figure 4E [The dotted lines indicate the distance between the two adjacent AQP0 tetramers at the positions of the ring system (~8.5 Å) and the acyl chain (~2.5 Å)].

      Figure 2 - a comment for the non-specialists on what this region of the protein is would be helpful context. Is this the pore with part of the NPA motif?

      We agree with the referee and added the following sentence to the caption of Figure 2: “A region of the water-conducting pathway close to the NPA (asparagine-proline-alanine), the AQP signature motif, is shown”.

      Reviewer #2 (Recommendations For The Authors):

      There is only one recommendation: In the results section entitled "Cholesterol positions observed in the electron crystallographic structures are representative of those around single AQP0 tetramers" the authors do not describe their approach. They refer to a reference (AponteSantamaria et al., 2012). The authors state the problem (investigate cholesterol positions), but it would be helpful to the readers if they also described the experimental approach.

      We agree with the reviewer and made the following addition to the sentence “we performed MD simulations and calculated time-averaged densities to investigate ...”

      Reviewer #3 (Recommendations For The Authors):

      Technical comments:

      (1) Authors stated: "Equilibration simulations were then performed until bulk membrane properties, such as thickness and deuterium order parameters, became stable and congruent with previous reports such as those by (Doktorova et al., 2020) and others (Figure 5-figure supplement 2 and Figure 5-figure supplement 3)." However, bilayer thickness is not represented in these figures. Additionally, I observed that the area per lipid (APL) appeared to be somewhat variable. This variation was particularly noticeable in systems where SM:CHOL=2:1, which seem to be not fully equilibrated. Is the figure displaying APL data for only one repetition? Could you please include plots for the other repetitions?

      We thank the reviewer for pointing this out. We would like to clarify that we used CHARMMGUI to generate one lipid bilayer configuration for each mixture and system size. These configurations (one per system) were extensively simulated to generate stable initial configurations of the lipid bilayers. Figure 5 – supplements 2 and 3 refer to this pre-equilibration step. The final pre-equilibrated configurations were then used in the subsequent multiple equilibrium MD runs that we performed, either with a single cholesterol molecule or with the AQP0 tetramer(s) inserted. We have clarified this procedure in the revised manuscript (see changes in the Methods section for the MD equilibrium simulations).  

      Concerning this pre-equilibration step, we have chosen the area per lipid, not thickness, to characterize the equilibration of the pure lipid bilayers. Accordingly, the area per lipid is the quantity shown in Figure 5 – figure supplement 3. We no longer refer to the membrane thickness in the revised manuscript.

      Concerning the variability in the area per lipid, we note that the large changes occur within the first few tens of nanoseconds of the pre-equilibration step, after which the area per lipid stabilizes. We would like to also point out that in Figure 5 – figure supplement 3, we chose a logarithmic scale for the time axis to actually make it possible for the reader to see the major changes that occur at the beginning of the pre-equilibration step (which would otherwise be difficult to see). In the particular case of the SM:CHOL=2:1 mixture_,_ the 64 lipids/leaflet system converged to a stable area per lipid value in the last 70 ns and the 244 lipids/leaflet system approached the same value in approximately the last 30 ns. This was a good indication that the large system had also converged. After equilibration of the membranes, a single cholesterol or AQP0 tetramer(s) were inserted and equilibrium simulations were initiated. However, the first 100 ns (or 300 ns in the case of the double tetramer system) were considered as a further equilibration time and were not included in the analysis. This is now explicitly stated in the revised manuscript: “The first 100 ns of each simulation replica (the first 300 ns for the two tetramer simulations) were considered as additional equilibration time and were not included in further analysis.”

      (2) Could you clarify the reasoning behind conducting the simulations at 323 K?

      We conducted the simulations at 323 K to ensure that the lipid bilayers were in the liquid phase.

      SM:CHOL mixtures have been reported to be in the liquid phase above 314 K (Keyvanloo et al. Biophys. J. 114: 1344, 2018). 323 K was thus chosen to be well above this value. Note that this temperature was also chosen in a previous MD simulation study of pure sphyngomyelin bilayers (Niemelä et al. Biophys. J. 87: 2976, 2004). This reasoning, as well as the two references, have been added to the Methods section in the revised manuscript.  

      (3) There appears to be a discrepancy in Figure 7. Panel F does not align with the provided caption. 

      We apologize for this mistake. The captions for panels E and F were switched. We corrected this mistake.

      (4) Likewise, in Figure 8, there is a mismatch between the caption and the figures. Furthermore, in the text, the authors assert, "In the absence of cholesterol, the AQP0 surface is completely covered by sphingomyelin in the hydrophobic region of the membrane and by water outside this region (Figure 8A, left column). As noted before, there are essentially no direct protein-protein interactions between the adjacent tetramers. When cholesterol was present at the interface, it interacted with AQP0 at the center of the membrane and remained mostly in place (Figure 8A, right column)." However, the left column shows cholesterol density. Could you please clarify this inconsistency, especially regarding the absence of cholesterol?

      We apologize for this mistake. The panels in Figure 8A showing the AQP0 surfaces in the absence and presence of cholesterol were switched. We corrected this mistake.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Estevam et al. reports new insights into the regulation of the receptor tyrosine kinase MET gained from two deep mutational scanning (DMS) datasets. In this paper, the authors use a classic selection system for oncogenic kinase signaling, the murine Ba/F3 cell line, to assess the functional effects of thousands of mutations in the kinase domains of MET in two contexts: (1) fusion of the whole MET intracellular region to the dimerization domain TPR, and (2) the same fusion protein, but with exon 14, which encodes part of the juxtamembrane region of MET, skipped. Critically, exon 14 skipping yields a version of MET that is found in many cancers and has higher signaling activity than the canonical MET isoform. The authors extensively analyze their DMS data to very convincingly show that their selection assay reports on kinase activity, by illustrating that many functionally important structural components of the kinase domain are not tolerant of mutation. Then, they turn their attention to a helical region of the juxtamembrane region (αJM), immediately after exon 14, which is posited to play a regulatory role in MET. Their DMS data illustrate that the strength and mutational tolerance of interactions between αJM and the key αC helix in the kinase domain depends on the presence or absence of exon 14. They also identify residues in the N-lobe of the kinase, such as P1153, which are not conserved across tyrosine kinases but appear to be essential for MET and MET-like kinases. Finally, the authors analyze their DMS data in the context of clinically-observed mutations and drug-resistance mutations.

      Overall, this manuscript is exciting because it provides new insights into MET regulation in general, as well as the role of exon 14. It also reveals ways in which the JM region of MET is different from that of many other receptor tyrosinekinases. The exon 14-skipped fusion protein DMS data is somewhat underexplored and could be discussed in greater detail, which would elevate excitement about the work. Furthermore, some of the cell biological validation experiments and the juxtaposition with clinical data are perhaps not assessed/interpreted as clearly they could be. Some constructive suggestions are given below to enhance the impact of the manuscript.

      Strengths:

      The main strengths of this paper, also summarized above in the summary, are as follows:

      (1) The authors very convincingly show that Ba/F3 cells can be coupled with deep mutational scanning to examine MET mutational effects. This is most clearly shown by highlighting how all of the known kinase structure and regulatory elements are highly sensitive to mutations, in accordance with a few other DMS datasets on other kinases.

      (2) A highlight of this paper is the juxtaposition of two DMS datasets for two different isoforms of the MET receptor. Very few comparisons like this exist in the literature, and they show how small changes to the overall architecture of a protein can impact its regulation and mutational sensitivity.

      (3) Another exciting advance in this manuscript is the deep structural analysis of the MET juxtamembrane region with respect to that of other tyrosine kinases - guided by the striking effect of mutations in the juxtamembrane helical region. The authors illustrate how the JM region of MET differs from that of other tyrosine kinases.

      (4) Overall, this manuscript will provide a resource for interpreting clinically relevant MET mutations.

      Weaknesses:

      (1) The manuscript is front-loaded with extensive analysis of the first DMS dataset, in which exon 14 is present, however, the discussion and analysis of the exon 14-skipped dataset is somewhat limited. In particular, a deeper discussion of the differences between the two datasets is warranted, to lay out the full landscape of mutations that have different functional consequences in the two isoforms. Rather, the authors only focus on differences in the JM region. What are the broader structural effects of exon 14 skipping across the whole kinase domain?

      Thank you for your feedback on our manuscript and our analysis of the exon 14 skipped mutational scanning data. The lack of a robust growth differential  between the wild type MET intracellular domain and the exon 14 skipped isoform within the Ba/F3 system suggests that there is not a significant growth advantage related to exon 14 skipping, likely due to the constitutive activation of both constructs by the TPR domain, which also suggests that the assay is potentially less sensitive to nuanced JM-driven effects between these two isoforms, aside from the highly sensitive ⍺JM-helix. We also lose insight on membrane-related interactions imposed on the juxtamembrane that may be important to fully understand the differences between these two isoforms in the cytoplasmically-expressed context. Therefore, we can at most speculate exon 14 skipped related differences between these two datasets.

      With these caveats in mind, to further address exon 14 and juxtamembrane-driven differences between these two mutational landscapes, we calculated the absolute score difference between TPR-METΔEx14 and TPR-MET (|METΔEx14 - MET|) and plotted the |ΔScore| in a heatmap. Overall, the two landscapes, as noted in the text, are largely similar with differences emerging mostly for specific mutations. Where we see the largest secondary structural difference continues to be the ⍺JM-helix, where MET is more sensitive to helix-breaking mutations such as proline. Again L1062 has the greatest difference in sensitivity between these two datasets for the ⍺JM-helix, with the introduction of negative charge resulting in loss-of-function for the TPR-MET kinase domain but having a null effect in the TPR-METΔEx14 kinase domain. Other positions with strong differences include the ⍺G and APE motif.

      We have incorporated more detailed discussion in text. 

      (2) It is unclear if gain-of-function mutations can actually be detected robustly in this specific system. This isn't a problem at face value, as different selection assays have different dynamic ranges. However, the authors don't discuss the statistical significance and reproducibility of gain- vs loss-of-function mutations, and none of the gain-of-function mutations are experimentally validated (some appear to show loss-of-function in their cellular validation assay with full-length MET). The manuscript would benefit from deeper statistical analysis (and discussion in the text) of gain-of-function mutations, as well as further validation of a broad range of activity scores in a functional assay. For the latter point, one option would be to express individual clones from their library in Ba/F3 cells and blot for MET activation loop phosphorylation (which is probably a reasonable proxy for activity/activation).

      Thank you for your comment on the statistical interpretations of gain-of-function (GOF) and loss-of-function (LOF) mutations. In this study we classify GOF and LOF based on the following metrics:

      (1) The difference between the missense mutation score and the wild type synonymous score for a given position must be smaller than the calculated propagated error, for both IL-3 withdrawal and IL-3 conditions

      (2) Missense mutations must be ≥ ±2 standard deviations (SD) from the mean of wild type synonymous mutations

      Given that our assay was conducted in a constitutively active kinase in the TPR-fusion context, gain-of-function mutations are expected to not only be rare, but also supersede baseline fitness. Within the IL-3 conditions, we expect that cells are not reliant or “addicted” to MET for growth proliferation. Nevertheless, due to the parallel nature of the screen, we can compare scores for variants in the IL-3 control and IL-3 withdrawal conditions to filter mutations that are solely exhibiting high fitness under selective pressure.

      To identify these mutations we 1) calculated the propagation of error between IL-3 and IL-3 withdrawal scores for the same variant 2) calculated the absolute difference between IL-3 and IL-3 withdrawal scores for the same variant 3) filtered variants if the IL-3 withdrawal score was ≥ +2 SDs, the IL-3 score was ≤ 0, and the absolute score difference between IL-3 and withdrawal conditions was larger than the propagated error.

      In analyzing mutations within the IL-3 withdrawal conditions, applying our statistical metrics, we find 33 mutations within the MET library, and 30 in the METΔEx14 library, that have a score of ≥ +2 SD and low propagated error. By increasing our boundary to ≥+2.5 SD, we can classify mutations with even higher confidence, identifying 10 mutations within the MET library, and 9 in the METΔEx14 library (Supplemental Data Figure 7).

      (3) In light of point 2, above, much of the discussion about clinically-relevant gain-of-function mutations feels a bit stretched - although this section is definitely very interesting in premise. A clearer delineation of gain-of-function, with further statistical support and ideally also some validation, would greatly strengthen the claims in this section.

      To address this concern, we have provided additional analysis and details on gain-of-function (GOF) classification in Supplemental Data Figure 5 and the overlap between GOF and clinically associated mutations in Supplemental Data Figure 8. Within our gain-of-function classifications, we pick up on several mutations at positions that have been clinically detected and experimentally validated in previous studies in both libraries (D1228, G1163, L1195), and show that GOF mutations also have low variance.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a deep mutational scanning (DMS) study of the kinase domain of the c-MET receptor tyrosine kinase. The screen is conducted with a highly activated fusion oncoprotein - Tpr-MET - in which the MET kinase domain is fused to the Tpr dimerization element. The mutagenized region includes the entire kinase domain and an alpha-helix in the juxtamembrane region that is essentially part of the MET kinase domain. The DMS screen is carried out in two contexts, one containing the entire cytoplasmic region of MET, and the other with an "exon 14 deletion" which removes a large portion of the juxtamembrane region (but retains the aforementioned alpha-helix). The work provides a robust and essentially exhaustive catalog of the effect of mutations (within the kinase domain) on the ability of the Tpr-MET fusion oncoproteins to drive IL3-independent growth of Ba/F3 cells. Every residue in the kinase is mutated to every natural amino acid. Given the design of the screen, one would expect it to be a powerful tool for identifying mutations that impair catalytic activity and therefore impair IL3-independent proliferation, but not the right tool for identifying gain-of-function mutations that operate by shifting the kinase from an inactive to active state (because the Tpr-Met fusion construct is already very highly activated). This is borne out by the data, which reveal many many deleterious mutations and few "gain-of-function" mutations (which are of uncertain significance, as discussed below).

      Strengths:

      The authors take a very scholarly and thorough approach to interpreting the effect of mutations in light of available information for the structure and regulation of MET and other kinases. They examine the effect of mutations in the so-called catalytic (C) and regulatory (R) spines, the interface between the JM alpha-helix and the C-helix, the glycine-rich loop, and other key elements of the kinase, providing a structural rationale for the deleterious effect of mutations. Comparison of the panoply of deleterious mutations in the TPR-met versus TPR- exon14del-MET DMS screens reveals an interesting difference - the exon14 deletion MET is much more tolerant of mutations in the JM alpha-helix/C-helix interface. The reason for this is unclear, however.

      Weaknesses:

      Because the screens were conducted with highly active Tpr-MET fusions, they have limited power to reveal gain-of-function mutations. Indeed, to the extent that Tpr-MET is as active or even more active than ligand-activated WT MET, one could argue that it is "fully" activated and that any additional gain of fitness would be "super-physiologic". I would expect such mutations to be rare (assuming that they could be detected at all in the Ba/F3 proliferation assay). Consistent with this, the authors note that gain-of-function mutations are rare in their screen (as judged by being more fit than the average of synonymous mutations). In their discussion of cancer-associated mutations, they highlight several "strong GOF variants in the DMS". It is unclear what the authors mean by "strong GOF", indeed it is unclear to this reviewer whether the screen has revealed any true gain of function mutations at all. A few points in this regard:

      (1) More active than the average of synonymous mutations (nucleotide changes that have no effect on the sequence of the expressed protein) seems to be an awfully low bar for GOF - by that measure, several synonymous mutations would presumably be classified as GOF.

      We completely agree that any mutation above the average synonymous would not be a robust assessment and thus why we statically filtered mutations in our entire analysis. To this point, and that of  Reviewer 1, we have further outlined our statistical definitions. In classifying mutations as GOF or LOF, the following parameters were used:

      (1) The difference between the missense mutation score and the wild type synonymous score for a given position must be smaller than the calculated propagated error, for both IL-3 withdrawal and IL-3 conditions

      (2) Missense mutations must be ≥ ±2 standard deviations (SD) from the mean of wild type synonymous mutations

      Therefore, only variants at the tail-ends of the mutational distribution were assessed, and further filtered based on propagation of error. For this reason, a “strong GOF” mutation as noted in this study is one that improves the fitness of an already active kinase. As pointed out, within our analysis, these are very rare occurrences, and in focusing on cancer-associated mutations we find that the variants that meet these statistical parameters require a larger genetic “leap” in the codon space. Overall, we have also changed our language in reference to GOF mutations in text.

      We hope this concern has been addressed in the new Supplemental Data Figures.

      (2) In the +IL3 heatmap in supplemental Figure 1A, there is as much or more "blue" indicating GOF as in the -IL3 heatmap, which could suggest that the observed level of gain in fitness is noise, not signal.

      We hope this concern has been addressed in the previous responses and new Supplemental Data Figures.

      (3) And finally, consistent with this interpretation, in Supplemental Figure 1C, comparing the synonymous and missense panels in the IL3 withdrawal condition suggests that the most active missense mutations (characterized here as strong GOF) are no more active than the most active synonymous mutations.

      We hope this concern has been addressed in the previous responses and figures above.

      My other major concern with the work as presented is that the authors conflate "activity" and "activation" in discussing the effects of mutations. "Activation" implies a role in regulation - affecting a switch between inactive and active conformations or states - at least in this reviewer's mind. As discussed above, the screen per se does not probe activation, only activity. To the extent that the residues discussed are important for activation/regulation of the kinase, that information is coming from prior structural/functional studies of MET and other kinases, not from the DMS screen conducted here. Of course, it is appropriate and interesting for the authors to consider residues that are known to form important structural/regulatory elements, but they should be careful with the use of activity vs. activation and make it clear to the reader that the screen probes the former. One example - in the abstract, the authors rightly note that their approach has revealed a critical hydrophobic interaction between the JM segment and the C-helix, but then they go on to assert that this points to differences in the regulation of MET and other RTKs. There is no evidence that this is a regulatory interaction, as opposed to simply a structural element present in MET (and indeed the authors' examination of prior crystal structures shows that the interaction is present in both active and inactive states.

      Thank you, and we completely agree that the distinction between “activity” and “activation” is important and that we can at most speculate and propose models for effects related to activation from this screen. We have edited the text to reflect these distinctions. In respect to activation and the second point, we believe the screen highlights the ⍺JM-C interface as a critical structural region, which may have a role in regulation based on the paradigm of juxtamembrane regulation in RTKs, the presence of a similar interface in TAM family kinases, the co-movement of the ⍺JM-helix and ⍺C-helix between active and inactive conformations in the structural ensemble, and the observation that within the TPR-METΔEx14 library there is a greater tolerance for mutations at interface positions than TPR-MET. We hope that are follow-up studies that directly probe the ⍺JM-C interface in respect to the entire juxtamembrane to truly say if/ what role this conserved motif plays in regard to MET function. We have changed the language of the text to reflect how these differences contribute to our proposed model, rather than any unintended assertion on direct regulatory effects.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggested major points to address:

      (1) Although the authors show that several key functional residues in the kinase domain are highly sensitive to mutation, it would be nice if the authors further established a clear connection between kinase activity and enrichment in the Ba/F3 assay. Specifically, it is unclear to what extent there is a correlation between the extent of enrichment/depletion and kinase activity - is a larger activity score necessarily indicative of higher kinase activity? This is partly validated by the P1153L mutation autophosphorylation western blots in Figure 4B, but this correlation is somewhat undermined by the data in 5F. Autophosphorylation data (or phosphorylation data on a direct downstream substrate) for a few mutants would really solidify what the activity score is truly reporting. This might also clarify the extent to which the difference between the two screens can be interpreted, and the extent to which gain-of-function can be interpreted.

      The Ba/F3 assay was carefully chosen for its addiction to exogenous IL-3, which serves as a permissive signaling switch. Any mutation that prevents TPR-MET/ΔEx14 from properly functioning is therefore dampening its signaling ability. Nevertheless, it is possible that some mutations with high scores are truly improving activity and others are sustaining activity through more stable interactions than the wild type kinase domain or with downstream signaling partners, which would require careful biochemical dissection outside the scope of this study. To address these points, we now refer to the mutation score simply as “score” rather than “activity score” and further discuss these caveats in text.

      (2) Overall, the exon 14-skipped dataset is under-discussed in the paper. The comparison of the two datasets is where most deep insights are likely to be found, and so a more thorough analysis/discussion of this dataset would really elevate the significance of the paper. For example, there appear to be a very large number of mutations that have divergent effects in the two screens (everything along the dashed lines in Figure 5D), but it's unclear where most of these mutations lie on the structure. It would be helpful if the residues with divergent mutational effects between the two screens (Supplementary Figure 5E) were mapped onto a structure of the JM-KD construct.

      To address this concern, new analysis has been added to the supplement, showing the score differences between MET and METΔEx14 mutations as a heatmap (Supplemental Data Figure 7A). Within this analysis we further applied our statistical filtering methods and structurally mapped positions with the greatest differential scores to show where divergent effects cluster (Supplemental Data Figure 7D). Consistent with our previous reports, the ⍺JM-helix and ⍺C-helix show the largest cluster of divergent effects, in addition to sites such as the ⍺G and APE motif. Further discussion of these points have been added to the text.

      (3) Based on the observations that αJM-αC interactions seem to be less strictly required in the exon 14 mutant, the hypothesis that exon 14 skipping merely removes a Cbl docking site seems largely unsatisfactory. There seems to be more direct structural alterations that could explain this change, but these are not really discussed or speculated on. Related to this, while L1062 mutations are more tolerated, as the authors showed in both the mutational heatmap and the cellular experiments, its binding counterpart L1125 still seems to be somewhat immutable based on the heatmaps. So, more hypothesis/exploration of how exon 14 skipping affects MET KD structure would be a nice addition to the paper.

      We agree that loss of the Cbl docking site is an insufficient model to capture the full nature of JM regulation and exon 14 skipping effects, which was a major incentive for this study. The outstanding ⍺JM-⍺C-helix sensitivity also excites us because it points to a potential regions of the JM that potentially is involved in kinase activity through ⍺C-helix interactions, much like the CDK models and other RTK-JM interactions. We observed that the ⍺JM-⍺C helix retain contact, and propose that the ⍺JM-⍺C helix move in unison between active and inactive conformations. However, it is possible that a more complicated mechanism might also exist, where there is a larger degree of maintenance of these contacts in a homodimer. For instance, in Figure 3G, if you compare the ⍺JM-helix conformations, in both RON and AXL there is more distance and a pivot away from the ⍺C-helix. It’s is possible that there are shared mechanisms between the MET and TAM families that could further elucidate exactly how these ⍺JM-helices interact with the kinase domain during the activity transitions and what biophysical role JM truncations play.

      (4) The discussion about mutations S1122Q and L1062D is a bit confusing and incomplete. From the DMS data, it appears that L1062D should be mildly gain-of-function for the exon 14 deletion variant and very loss of function for wild-type MET. In the validation HeLa cell experiments L1062D is loss-of-function in both contexts, but a mention of this discrepancy is omitted. Then, when the discordance between DMS and HeLa cell experiments is observed again for S1122Q, it is explicitly called out for activation-loop phosphorylation, but then there is no mention of the fact that HGF stimulation leads to greater pERK levels for S1122Q in the exon 14 deletion context (the opposite of the DMS result). The Erk phosphorylation discrepancy should be mentioned. It is entirely reasonable, as the authors suggest, that there are differences between full-length MET and the TPR fusions, but the enhanced Erk phosphorylation by the S1122Q mutation is surprising (and intriguing!). This section could use some re-analysis/re-writing and further discussion.

      Thank you for this comment. As noted L1062D shows slight GOF in METΔEx14 but LOF in MET. The blots show expression of L1062D and S1122Q in the full length receptor in the absence and presence of HGF stimulation. L1062D is loss of function for both contexts only in -HGF conditions, but shows expression in phosphorylated METΔEx14, but not MET. For S1122Q, indeed there is a stronger pERK signal in the METΔEx14, which highlights how probing all regions of phosphorylation (A-loop and C-tail) and many MET-associate pathways (ERK, AKT) may be important to understand in what way these mutations are affect MET phosphorylation and proliferation. We have included this point in the text.

      (5) Related to the previous point, one other thing to consider here is that perhaps gain-of-function mutations are simply not detectable in this particular DMS assay. The authors state that GOF and LOF are defined as 2 standard deviations from the mean of the WT-synonymous distribution. How many mutations are actually designated to be GOF based on this criterion? Are those GOF mutations as reproducible as the LOF mutations? It would be worthwhile to separately analyze the variance in activity scores for every loss-of-function mutation and gain-of-function mutation. It seems likely that loss-of-function scores are a lot more reproducible than gain-of-function ones, suggesting that the most apparent gain-of-function signal is just noise in the assay. The few outliers to this point (true gain-of-function mutations) may be some of the ones discussed in Figure 6. If this is true, it would lend confidence to the claims associated with Figure 6.

      In analyzing and classifying both GOF and LOF mutations, error was a main filtering parameter. Each fitness score, calculated by Enrich2, is representative of the slope across time points  and biological replicates for the read frequency of the mutation. The associated standard error (SE) reflects the variance for each mutation within the scoring framework (Rubin et al., 2017). Mutations were then further filtered based on low propagated error, calculated by comparing the standard error (SE) of each missense mutation to the SE of the respective wild type synonymous mutation. Therefore, mutations were only classified as GOF or LOF if there was low error, in addition to the other score filters previously described. We have plotted the classified GOF mutations with their respective SE in the newly incorporated Supplemental Data Figure 8C.

      (6) In the discussion of panels 6C and 6D, the assertion is that the "clinical, not validated" category has more mutations that are low-fitness outliers than the "clinical, validated" category. From the graphs, it's actually hard to tell if this is the case for two reasons: (1) the way the graphs are normalized, (to the largest value in each histogram), you cannot compare bar heights (and thus number of mutations) between two histograms on the same graph. (2) Just looking at the shapes of the distributions, or considering maybe the mean or median values, it's unclear whether the "validated" and "not validated" populations are actually different from one another.

      This is an important indication, and we have added analysis showing the distribution and number of clinically-associated mutations within our libraries without normalization in the main text and in Supplemental Data Figure 8A-B.

      (7) This sentence in the last results section is somewhat unclear: "GOF resistance mutations may indicate an effect on the equilibrium of kinase activation, whereas LOF resistance mutations likely affect inhibitor-protein interactions directly." The first part makes sense, but it is not totally obvious how one can infer anything about inhibitor-protein interactions from mutations that are LOF with respect to kinase activity. Related to this, how are LOF mutations selected in the presence of an inhibitor? Is the assumption here that the mutation might totally abrogate inhibitor binding but only slightly impair the kinase? Perhaps this could be explained a bit more.

      Here, the idea we wanted to get across is that there are two models  that can explain how a mutation can contribute to resistance: shift the activity equilibrium at baseline or directly impair drug effects and restore baseline activity. Mutations that are labeled resistant and GOF, favor the first model. Mutations that are labeled resistant and LOF, favor the second model. In the presence of an inhibitor, which is in the scope outside of this study, LOF mutations would be sensitive to the inhibitor (ie WT-like and sensitive).

      (8) Some additional details of the library preparation and sequencing should be given in the methods section. It appears that the variable region of the library is roughly 275 amino acid residues long, which means >800 bases. How was this sequenced? From the methods, it sounds like all of the variants were pooled into a single library, but then sequencing was done using a 300x300 paired-end Illumina kit, which would not cover the length of the whole variable region. Was the library actually screened in segments as sub-libraries and then separately sequenced? Alternatively, was the whole library screened at once, and then different segments were amplified out for sequencing? If the latter approach is used, this could yield confounding results for counting wild-type variants that have the parent wild-type coding sequence. For example, if you amplify your kinase library in three segments after a single selection on the whole library, and you sequence those three segments separately, you might find a read that appears as wild-type in the part you amplified/sequenced but has a mutation in a region that you did not sequence. If this approach is taken, the counts for the wild-type sequence would be inaccurate, in which case, how is the data normalized with WT as a reference? Regardless of the method used, some more details should be provided in the methods section.

      In this study, we used the Nextera XT DNA Library Preparation Kit (Illumina), which uses a tagmentanation approach that randomly fragments our 861 bp amplicon into ~300 bp fragments with a transposase, resulting in a Poisson distribution of fragment sizes. This allows for direct sequencing of all amplicons and libraries with an SP300 paired-end run, which we ran on two lanes of a NovaSeq6000. Samples are demultiplexed  and processed by our analysis pipeline with a lookup table that associates the unique dual index to the specific sample (library, time point, biological replicate, IL-3 condition).

      The TPR-MET and TPR-METΔEx14 libraries were prepared in parallel throughout the entire experiment, from cloning to virus generation to transductions, screening, cell harvesting, sequencing prep, and sequencing. In other words, the TPR-MET and TPR-METΔEx14 were transduced into their own, respective batch of cells for each biological replicate, then selected and screened on the same day for each replicate and time point. Each library and condition (time point, biological replicate, IL-3 condition) was prepared in parallel but still an independent sample. At the stage of tagmentation, each sample was arrayed, where each well corresponds to a library, biological replicate, and time point. At the stage of sequencing, samples across the two libraries were normalized to 10mM (library, biological replicate, time point, IL-3 condition) then pooled together and all run on two lanes of the same NovaSeq6000 flow cell.

      PCR and sequencing bias was one of the most important parameters for us, which is why we performed tagmentation in parallel and sequenced everything on the same run. We have added extra details to the methods and hope that we have clarified your questions on this matter.

      Suggested minor points to address:

      (1) TPR (as in TPR-MET fusion) is not defined in the text when it is first mentioned. And it wasn't immediately clear that this is not a membrane-associated domain (Figure 5E makes this way more obvious than Figure 1B does). Perhaps this could be made more explicit in the text or in Figure 1.

      We have incorporated a new schematic in Figure 1B to better illustrate the TPR-fusion constructs used within this study. The usage of the TPR-fusion is first mentioned in the introduction, paragraph 4, and revised the main-text to delineate the usage of the TPR-fusion more clearly.

      (2) In Figure 2G, it would be helpful if the wild-type amino acid residue was listed underneath the position number in the two graphs (even though those residues are also highlighted in 2H).

      Thank you for this recommendation, we have added the wild type amino acid next to the position number in the x-axis label.

      (3) For Supplementary Data Figure 2, is it possible to calculate conservation scores at each position using some kind of evolutionary model, rather than relying on visual inspection of the sequence logo? Can one quantitatively assert that the C-spine is less conserved than the R-spine overall, or can this only be said for certain positions? Related to this, in comparing Figure 2G to Supplementary Data Figure 2, it is interesting that there isn't any obvious correspondence between mutational tolerance and conservation within the C-spine. For example, 1165 seems to be the most conserved position in the C-spine, but several substitutions are tolerated at this position, just like 1210, which is one of the least conserved positions in the C-spine. Finally, it's very likely that positions 1165, 1210, 1272, and 1276 co-vary, given that they all pack into the same hydrophobic cluster. This might be why they appear less conserved. These last few points might be worth discussing briefly if the authors want to relate mutational tolerance to evolutionary conservation.

      Thank you for this recommendation. To better quantitatively determine C-spine versus R-spine conservation, we performed a multiple sequence alignment of all RTK kinase domain sequences to properly identify corresponding R- and C-spine locations, as previously done in generating the spine logos, then used the bio3D structural bioinformatics package in R to calculate the conservation score of each residue position by amino acid “similarity” with a blosum62 matrix (Supplemental Data Figure 2B). In concordance with the logos, we find that C-spine positions 1092, 1108, 1165 have the highest conservation scores, even compared to some R-spine mutations. We also see across the alignment that indeed, C-spine positions 1165 1210,1211,1212, and 1272, and 1276 co-vary within RTK families. We have revised the text to reflect these points, and more specifically discuss position-level conservation rather than generalizing conservation for the C- and R-spines.

      (4) On Page 7 of the merged document, there appear to be some figure labeling errors. In the first and second paragraphs of the "Critical contacts between..." section, Figure 3B is referenced multiple times as a structural alignment/ensemble, but this is a heatmap.

      Thank you for catching this! The correct figure panels are now referenced.

      (5) In the text describing Figure 3A, it is stated that the structures were aligned to the N-lobe, but the figure legend says that all structures were aligned to alpha-C and alpha-JM.

      Thank you - a local alignment to the ⍺JM-helix and ⍺C-helix is correct, the idea here being that if the ⍺JM-helix and ⍺C-helix are linked to an active/inactive conformation like in the case of the insulin receptor, these two clusters could be revealed through the structural ensemble. However, we discovered this was not the case, combined with the DMS sensitivity to mutations at the packing interface leads us to believe that the MET JM has a distinctive regulatory mechanism that relies on this ⍺C-helix interface. We have made this correction to the text.

      (6) It would be helpful if the alpha-C and alpha-JM helices in Figure 3D were labeled on the MET structures.

      The ⍺C-helix and ⍺JM-helix are now labeled in Figure 3D.

      (7) It appears that Figure 4E is never explicitly referenced in the text.

      Thank you, Figure 4E is now appropriately referenced in the text.

      (8) Throughout the Figure 6 legend, for the histograms, it is stated that "Counts are normalized to the total mutations in each screen dataset." This might not be the correct description of normalization, as this would mean that the sum of all of the bins should equal 1. Rather, the normalization appears to be to the bin with the largest number of mutants in it, which is given a value of 1. This difference is really critical to how one visually inspects the overlaid histograms.

      Thank you for this comment. Here, the intention was to aid in the visualization of the distribution of cancer-associated and resistance associated mutations, which is a much smaller population compared to the whole library and becomes easily masked. We originally applied a “stat(ncount)” function in R, which as noted scales the data and sets the peak to 1, which only applied to the clinical and cancer-associated mutations plotted. Now, to better compare distributions, normalization has been removed, instead opting to overlay the distributions of all missense mutations and the subset of clinical mutations directly with their own y-axis scale. This modification has been made throughout Figure 6 panels, hopefully improving interpretability.

      Reviewer #2 (Recommendations For The Authors):

      A few thoughts/suggestions:

      (1) Regarding kinase regulation, the "closing of N- and C-lobe" upon activation is an often mentioned component of activation, and I'm sure is true in many cases, but it is not a general feature of kinase activation.

      The text has been updated - we removed the description of N- and C-lobe closure. 

      (2) With respect to the inactive state of MEK, the DFG-flipped structure discussed here is almost certainly an inhibitor-induced conformation. Again, DFG-flip is often discussed as a mechanism of kinase regulation, and while in some kinases this might be the case, more often it is a drug-induced or drug-stabilized inactive conformation. The SRC/CDK-like inactive conformation in 2G15 is more likely a physiologically relevant inactive state. (or even better, the ATP-bound inactive state structure 3DKC, which exhibits a somewhat different SRC/CDK-like inactive conformation).

      The PDB 3R7O structure was chosen as the main representation because it was the clearest representation of a wild type structure with an aligned R- and C- spine, solvent-exposed, phosphorylated activation loop. Although 3DKC is bound to ATP, this structure is still in an inactive conformation and has stabilizing mutations (Y1234/F, Y1235D) and an atypical alpha helix structure in the activation loop. However, we agree the SRC/CDK-like inactive conformation is an important representation and we have incorporated our structural mapping on 2G15 in the new supplemental figures with further details on statistical analysis and comparison of libraries.

      (3) Following the comments above, I would describe the process of activation in a simpler way (in any case, it is peripheral to the work described here). Something along the lines of "phosphorylation on tyrosines XX and XX induces rearrangement of the activation segment and promotes and stabilizes the inward active position of the C-helix." Can go on to mention that this forms the E1127/K1110 salt bridge. (The DFG is already "in" in the SRc/CDK-like inactive state).

      We have changed the language to more simply describe activation. Thank you!

      (4) Would be great to see DMS with the intact receptor done in a way that could identify mutations that lead to activation in a ligand-independent manner. (but obviously beyond the scope of this paper).

      Agreed! This would be an excellent follow up for the future, especially to elucidate juxtamembrane regulation, as the membrane context is likely required.

      A typo or two:

      Boarded instead of bordered/outlined in legend to Fig. 1.

      P11553L in the 2nd line of the 2nd paragraph in that section.

      Thank you, we have addressed these typos!

    1. Author response:

      eLife assessment

      This valuable study uses single-cell transcriptomics to explore the mouse vomeronasal organ and represents an advance that enhances our understanding of neural diversity within this sensory system. Findings suggest a unique endoplasmic reticulum (ER) structure in Gnao1 neurons and allow for the synthesis of a developmental trajectory from stem cells to mature vomeronasal sensory neurons. Convincing methods, data, and analyses broadly support the claims, although experiments supporting the main ER-related claim are incomplete and lack quantification of co-expression and statistics on labeling intensity or coverage. Adding these data would greatly strengthen the conclusions of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Devakinandan and colleagues present a manuscript analyzing single-cell RNA-sequencing data from the mouse vomeronasal organ. The main advances in this manuscript are to identify and verify the differential expression of genes that distinguish apical and basal vomeronasal neurons. The authors also identify the enriched expression of ER-related genes in Gnao1 neurons, which they verify with in situ hybridizations and immunostaining, and also explore via electron microscopy. Finally, the results of this manuscript are presented in an online R shiny app. Overall, these data are a useful resource to the community. I have a few concerns about the manuscript, which I've listed below.

      General Concerns:

      (1) The authors mention that they were unable to identify the cells in cluster 13. This cluster looks similar to the "secretory VSN" subtype described in a recent preprint from C. Ron Yu's lab (10.1101/2024.02.22.581574). The authors could try comparing or integrating their data with this dataset (or that in Katreddi et al. 2022) to see if this is a common cell type across datasets (or arises from a specific type of cell doublets). In situ hybridizations for some of the marker genes for this cluster could also highlight where in the VNO these cells reside.

      Cluster13 (Obp2a+) cells identified in our study have similar gene expression markers to those identified with the “putative secretory” cells in Hills et al. manuscript. At the time this manuscript was available publicly, our publication was already finalized and communicated. We welcome the suggestion to integrate data, which we will attempt and address in our revision.      

      (2) I found the UMAPs for the neurons somewhat difficult to interpret. Unlike Katreddi et al. 2022 or Hills et al. 2024, it's tricky to follow the developmental trajectories of the cells in the UMAP space. Perhaps the authors could try re-embedding the data using gene sets that don't include the receptors? It would also be interesting to see if the neuron clusters still cluster by receptor-type even when the receptors are excluded from the gene sets used for clustering. Plots relating the original clusters to the neuronal clusters, or dot plots showing marker gene expression for the neuronal clusters might both be useful. For example, right now it's difficult to interpret clusters like n8-13.

      We will represent the UMAPs to make the developmental trajectory clearer. How neuron clusters are affected by the presence or exclusion of receptors is an interesting question that we will address in our revision, along with showing markers of each neuronal cluster, as suggested by the reviewer.  

      Reviewer #2 (Public Review):

      Summary:

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript.

      Strengths:

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic.

      (2) The analysis suggests that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors.

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons.

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons.

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons.

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community.

      Weaknesses:

      (1) The connection between observations from sc RNA-seq and EM is unclear.

      (2) The lack of quantification for the ER phenotype is a concern.

      We would like to point out that the connection between scRNA-seq and EM was made in our experiments that investigated the localization of ER proteins via IHC (in Figure 5). The intriguing observation that the levels of a number of ER luminal and membrane proteins were higher in Gnao1 compared to Gnai2 neurons, led us to hypothesize a differential ER content or ultrastructure, which was verified by EM. The quantification of ER phenotype would definitely strengthen our observations, which we will add in our revised manuscript.       

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report the enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and co-expression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns.

      Strengths:

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting of a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes.

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...).

      Weaknesses:

      The study still requires refined analyses of the data and rigorous quantification to support the main claims.

      The method description for filtering and clustering single-cell RNA-sequencing data is incomplete. The Seurat package has many available pipelines for single-cell RNA-seq analysis, with a significant impact on the output data. How did the authors pre-process and normalize the data? Was the pipeline used with default settings? What batch correction method was applied to the data to mitigate possible sampling or technical effects? Moreover, the authors do not describe how cell and gene filtering was performed.

      The data in Figure 7-Supplement 3 show that one-sixth of the V1Rs do not express any chemoreceptor, while over a hundred cells express more than one chemoreceptor. Do these cells have unusually high or low numbers of genes or counts? To exclude the possibility of a technical artifact in these observations, the authors should describe how they dealt with putative doublet cells or debris.

      Surprisingly, some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors.

      The identification of the VSN types should be consistent across the different analyses and validated. The data presented in Figure 1 lists four mature VSN types, whereas the re-clustering of neurons presented in Figure 3 leads to a different subdivision. At present, it remains unclear whether these clusters reflect the biology of the system or are due to over-clustering of the data, and therefore correspond to either noise or arbitrary splitting of continua. Clusters should be merged if they do not correspond to discrete categories of cells, and correspondence should be established between the different clustering analyses. To validate the detected clusters as cell types, markers characteristic of each of these populations can be evaluated by ISH or IHC.

      There is a lack of quantification of imaging data, which provides little support for the ER-related main claim. Quantification of co-expression and statistics on labeling intensity or coverage would greatly strengthen the conclusions and the title of the paper.

      scRNA-seq data analysis methods: We agree with the reviewer and will elaborate on the various criterion, parameters and methods in our revision. As described above, our revised manuscript will include analysis of how inclusion / exclusion of VRs affects cell clusters, as well as quantification of the ER phenotype. We will address the reviewer’s concern of over-clustering.

      We think that the cells expressing zero as well as two V1Rs are real and cannot be attributed to debris or doublets for the following reasons:

      a) Cells expressing no V1Rs are not necessarily debris because they express other neuronal markers at the same level as cells that express one or two V1Rs. Higher expression threshold values used in our analysis may have somewhat increased the proportion of cells with zero V1Rs. We will modify figure 7-supplement 3c to add another group showing Gnai2 level in cells expressing zero V1Rs.

      b) Cells co-expressing V1R genes: We listed the frequency of cells co-expressing V1R gene combinations in Supplementary table - 8. Among 134 cells that express two V1Rs, 44 cells express Vmn1r85+Vmn1r86, 21 express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177, and so on. Doublets generally are a random combination of two cells. Here, each specific co-expression combination represents multiple cells and is highly unlikely by random chance. Some of the co-expression combinations were identified earlier and verified experimentally in Lee et al., 2019 and Hills et. al. Furthermore, Figure-7 supplement 3c shows that the level of Gnai2 expression is comparable across cells expressing one or two V1Rs. If the V1R expressing cells are doublets, we expect the level of Gnai2 to be higher, as compared to cells expressing single V1R. We will elaborate on this in our revised manuscript.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The manuscript by Sejour et al. is testing "translational ramp" model described previously by Tuller et al. in S. cerevisiae. Authors are using bioinformatics and reporter based experimental approaches to test whether "rare codons" in the first 40 codons of the gene coding sequences increase translation efficiency and regulate abundance of translation products in yeast cells. Authors conclude that "translation ramp" model does not have support using a new set of reporters and bioinformatics analyses. The strength of bioinformatic evidence and experimental analyses (even very limited) of the rare codons insertion in the reporter make a compelling case for the authors claims. However the major weakness of the manuscript is that authors do not take into account other models that previously disputed "rare or slow codon" model of Tuller et al. and overstate their own results that are rather limited. This maintains to be the weak part of the manuscript even in the revised form.

      We are glad the reviewer thinks our evidence makes “a compelling case for the authors claims”. This was our main aim, and we are satisfied with this.

      The reviewer believes the major weakness of the manuscript is that we do not take into account other models and do not (see below) cite numerous other relevant papers. The reviewer made essentially the same criticism at the first review, at which time we looked quite hard for papers generally meeting the reviewer’s description. We found a few, which we incorporated here. Still, we did not find the body of evidence whose existence the reviewer implies. We are citing every study we know to be relevant, though of course we will have inadvertently missed some, given the huge body of literature. After the first round of review, we wrote “the reviewer did not give specific references, and, though we looked, we weren’t always sure which papers the reviewer had in mind.” We hoped the reviewer would provide citations. But only two citations are provided here, both to A. Kochetov, and these don’t seem central to the reviewer’s points.

      The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data. Moreover several studies have used bioinformatical analyses to point out the evolution of N-terminal sequences in multiple model organisms including yeast, focusing on either upstream ORFs (uORFs) or already annotated ORFs. The authors did not mention multiple of these studies in their revised manuscript and did not comment on their own results in the context of these previous studies.

      Mostly, we do not know to what papers the reviewer is referring. This may be our failing, but it would have helped if the reviewer had cited one of them. There are papers discussing the evolution of N-terminal sequences, but as far as we know, these do not discuss translation speed or codon usage. Of course, we may have missed some papers.

      As such the authors approach to data presentation, writing and data discussion makes the manuscript rather biased, focused on criticizing Tuller et al. study and short on discussing multiple other possible reasons for slow translation elongation at the beginning of the protein synthesis. This all together makes the manuscript at the end very limited.

      We think the reviewer may be considering our paper as being generally about translation speeds, whereas in our minds, it is not. This difference in views as to what the paper is “about” is perhaps causing friction. To us, it is indeed a limited paper. We are narrowly focused on the finding of Tuller that there is an enrichment of rare, slow codons at the 5’ end of genes, and we have sought an explanation of this particular fact. This is not a paper about rates of translation generally—it is a limited paper about the reason for the 5’ enrichment of rare, slow codons.

      To expand on this, the encoded slow 5’ translation due to rare, slow codons (of Tuller et al.) is a small effect (1% to 3%). The possible unencoded slow 5’ translation of unknown mechanism discussed by some other papers (e.g., Weinberg et al. 2016, Shah et al. 2013) is a much larger effect (50% or more). Just from the different magnitudes, it seems likely these are different phenomena. And yet, despite the small size of the encoded effect, it is for some reason this paper by Tuller et al. that has captured the attention of the literature: as we point out below, Tuller et al. has been cited over 900 times. Partly because of the wide and continuing influence of this paper, it is worth specifically and narrowly addressing its findings.

      Reviewer #2 (Public Review):

      Tuller et al. first made the curious observation, that the first ∼30-50 codons in most organisms are encoded by scarce tRNAs and appear to be translated slower than the rest of the coding sequences (CDS). They speculated that this has evolved to pace ribosomes on CDS and prevent ribosome collisions during elongation - the "Ramp" hypothesis. Various aspects of this hypothesis, both factual and in terms of interpreting the results, have been challenged ever since. Sejour et al. present compelling results confirming the slower translation of the first ~40 codons in S. cerevisiae but providing an alternative explanation for this phenomenon. Specifically, they show that the higher amino acid sequence divergence of N-terminal ends of proteins and accompanying lower purifying selection (perhaps the result of de novo evolution) is sufficient to explain the prevalence of rare slow codons in these regions. These results are an important contribution in understanding how aspects of the evolution of protein coding regions can affect translation efficiency on these sequences and directly challenge the "Ramp" hypothesis proposed by Tuller et al.

      I believe the data is presented clearly and the results generally justify the conclusions.

      We thank the reviewer for his/her attention to the manuscript, and for his/her comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review major weakness of the manuscript is the lack of analyses for confounding effects, overstatements of the results (using single amino acid sequence reporter) and the lack of discussion of previous work that argues against Tuller et al model. In my previous review I mentioned multiple other studies that addressed "slow codons" model in more detail.

      No, the reviewer did not cite any specific studies.

      While some of these studies are mentioned in the revised manuscript, authors are still rather biased and selective in their discussions. I should also point out that previous studies, that authors fail again to mention, were focused on either translation initiation, initiation to elongation transition or early elongation effects in relation to mRNA sequence, structure, codons as well as amino acid sequence. Also additional studies with bioinformatic analyses of N-terminal conservation and existence of start sites at the beginning of the protein sequences in multiple model organisms were also omitted.

      Again, we do not know to what papers the reviewer is referring. But this sounds like a lot. Our paper is aimed at a specific, narrow topic: Why is there an excess of rare, slow codons in the 5’ region of genes? We are not trying to make general statements about all things affecting and affected by translation speed, we are just trying to explain the excess of rare, slow codons.

      In general manuscript seems to be too much focused-on discussion of Tuller's paper . . .

      Yes, we are focused on the Tuller findings, the excess of rare slow codons in 5’ regions.

      . . . and arguing with the model that was already shown by multiple other studies to be limited and not correct.

      We find it unsatisfactory that the reviewer states in a public review that there are multiple other studies showing that the Tuller model is not correct, and yet does not cite any of them. Furthermore, for the reviewer to say that Tuller et al. is “not correct” is too sweeping. The core finding of Tuller et al. was the excess of rare, slow codons in the 5’ regions of genes. We confirm this; we believe it is correct; we are not aware of any literature disputing this. Then, Tuller interpreted this as an adaptation to promote translational efficiency. On the interpretation, we disagree with Tuller. But if one is to disagree with this interpretation, one needs an alternative explanation of the fact of the excess rare, slow codons. Providing such an alternative explanation, and doing an experiment to distinguish the explanations, is our contribution. We are not aware of any other paper making our interpretation.

      There are of course many papers that discuss various aspects of translation at the 5’ ends of genes, and we do cite quite a few such papers in our manuscript, though certainly not all. But papers of this general kind do not, and cannot, show that Tuller et al. is “not correct”. As far as we know, no paper provides an alternative explanation for the rare slow codons, and no paper does an experiment to modulate translation speed and look at the effect on gene expression. Notably, the slow translation phenomenon associated with the rare codons found by Tuller et al. is a very small effect—a change of about 1% to 3% of translation speed. Some other papers on translation speed are dealing with possible changes in the range of 50% or more. These are presumably some other phenomenon (if indeed they are even real changes in translation speed), and, whether they are true or not, the results and interpretations of Tuller et al. could still be true or not. Of course, if we knew of some previous paper showing the Tuller paper is not correct, we should and would cite it.

      To expand on the current view of Tuller in the literature, Tuller et al. has been cited 956 times according to Google Scholar. This makes it an extremely influential paper. After finding Tuller et al. in Entrez Pubmed, one can look under “Cited by” and see the five most recent papers that cite Tuller et al. The five papers given on May 23 2024 were Bharti . . . Ignatova 2024; Uddin 2024; Khandia . . . Choudhary 2024; Love and Nair 2024; and Oelschlaeger 2024. We went through these five most recent papers that cite Tuller et al., and asked, did these authors cite the Tuller results as fully correct, or did they mention any doubts about the results? All five of the papers cited the Tuller results as fully correct, with no mention of any kind of doubt. For instance, Kandia et al. 2024 state “The slow “ramp” present at 5’ end of mRNA forms an optimal and robust means to reduce ribosomal traffic jams, thus minimizing the cost of protein expression40.”, while Oelschlaeger (2024) states “Slow translation ramps have also been described elsewhere and proposed to prevent traffic jams along the mRNA [51,52,53].” Although Uddin (2024) cited Tuller as fully correct, Uddin seemed to think (it is a little unclear) that Tuller found an enrichment of highly-used codons, opposite to the actual finding. The multiple contrary studies mentioned by the reviewer do not seem to have been very influential.

      There are papers containing skepticism about the Tuller interpretation, and also papers with results that are difficult to reconcile in a common-sense way with the Tuller interpretation. But skepticism, and a difficulty to reconcile with common sense, are far from a demonstration that a paper is incorrect. Indeed, Tuller et al. may have been published in Cell, and may be so highly cited, exactly because the findings are counter-intuitive, colliding with common sense. Our contribution is to find a common-sense interpretation of the surprising but correct underlying fact of the 5’ enrichment of rare, slow codons.

      Having wrote that in the previous review, I have to admit that Sejour et al manuscript in the main text has a minimal amount of novelty with experimental evidence, the conclusions are based on three reporters with and without stalling/collision sequence with the same amino acid sequence and varying codons. Some more novelty is seen in bioinformatic analyses of multiple yeast sequences and sequence conservation at the N-termini of proteins. However, even this part of the manuscript is not discussed fully and with correct comparison to previous studies. Authors, based on my previous comments discuss further experimental shortcomings in their new and "expanded" discussion but the use of a single reporter in this case cannot relate to all differences that may be coming from ORFs seen in complete yeast transcriptome. There are multiple studies that used more reporters with more than one amino-acid and mRNA sequence as well as with similar variation of the rare or common codons. The handwaving argument about the influence of all other mechanisms that can arise from different start sites, RNA structure, peptide interaction with exit channel, peptidyl-tRNA drop-off, eIF3 complex initiation-elongation association, and etc, is just pointing up to a manuscript that is more about bashing up Tuller's model and old paper than trying to make a concise story about their own results and discuss their study in plethora of studies that indicated multiple other models for slow early elongation.

      We don’t understand why the reviewer is so grudging.

      Discussion of the ribosome's collisions and potential impact of such scenario in the author's manuscript is left completely without citation, even though such work has relevant results to the author's conclusions and Tuller's model.

      This is not true. We cite Dao Duc and Song (2018) “The impact of ribosomal interference, codon usage, and exit tunnel interactions on translation elongation rate variation.” PLoS Genet 14, and Tesina, . . . and Green (2020) “Molecular mechanism of translational stalling by inhibitory codon combinations and Poly(A) tracts. EMBO J., which are two excellent papers on this subject. We also cite Gamble et al. (2016), who found the underlying result, but at that time did not attribute it to ribosome collisions.

      Previous studies (not cited) for example clearly indicate how the length from stalling sequence to start codon is related to ribosome collisions. Moreover such studies are pointing out differences in initiation vs elongation rates that may impact ribosome collisions and protein expression. Both of these topics would be very valuable in discussions of evolutionary changes in the current yeast ORFs. Not to mention that authors do not really discuss also possibilities for differences in 5'UTRs and uORFs in relation to downstream ORFs sequence and codon composition.

      It is not clear to us that such papers are highly relevant to the issue on which we are working.

      The argument about whether cycloheximide or not is doing 5' ribosome slowdown (lines 425-443) is just rambling about Weinberg's paper from 2016 without any real conclusion. In this section authors are just throwing down hypothesis that were more clearly explained in Weinberg's manuscript or shown experimentally in studies done after the Weinberg et al. paper was published.

      Earlier, the reviewer had the criticism that “The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data.” The main study we know of dealing with these issues like these is that of Weinberg et al. 2016. In our opinion, this is a thoughtful paper on these issues. But now, at this point, the reviewer seems to criticize the fact that we do extensively cite results from Weinberg et al. It is true that there is no ultimate conclusion, but why there is no conclusion is a little bit interesting. Weinberg et al show that even in studies that do not use cycloheximide as the first step in ribosome profiling, there is some left-over high density of ribosomes near 5’ ends. But, all these ribosome profiling experiments do use cycloheximide at a later step in the procedure. Until someone does a ribosome profiling experiment without the use of any cycloheximide at any step, there will be no firm conclusion. This is not our fault—and also not the issue we are writing about. And, the reason this paragraph is in the manuscript at all is that the reviewer (we thought) had asked for something like this in the first review.

      At the end, even in the limited novelty of evolutionary arguments about non-existing N-terminal conservation of codons or amino acids they fail to cite and discuss previous work by Kochetov (BioEssays, 2008 and NAR, 2011) which have additional explanation on evolution of N-terminal sequences in yeast, human or Drosophila.

      These two papers of Dr. Kochetov’s have some relevance and we now cite them. These are the only papers cited by the reviewer in his/her two reviews.

      Probably the reviewer would have preferred a paper on a different subject.


      The following is the authors’ response to the original reviews.

      Response to Reviewers:

      We thank the reviewers for their comments, and their evident close reading of the manuscript. Generally, we agree with the reviewers on the strengths and weaknesses of our manuscript. Our revised manuscript has a more extensive discussion of alternative explanations for initial high ribosome density as seen by ribosome profiling, and which more specifically points out the limitations of our work.

      As a preface to specific responses to the reviewers, we will say that we could divide observations of slow initial translation into two categories, which we will call “encoded slow codons”, and “increased ribosome density”. With respect to the first category, Tuller et al. documented initial “encoded slow codons”, that is, there is a statistical excess of rare, slowly-translated codons at the 5’ ends of genes. Although the size of this effect is small, statistical significance is extremely high, and the existence of this enrichment is not in any doubt. At first sight, this appears to be a strong indication of a preference for slow initial translation. In our opinion, our main contribution is to show that there is an alternative explanation for this initial enrichment of rare, slow codons—that they are a spandrel, a consequence of sequence plasticity at the 5’ (and 3’) ends of genes. The reviewers seem to generally agree with this, and we are not aware that any other work has provided an explanation for the 5’ enrichment of rare codons.

      The second category of observations pertaining to slow initial translation is “increased ribosome density”. Early ribosome profiling studies used cycloheximide to arrest cell growth, and these studies showed a higher density of ribosomes near the 5’ end of genes than elsewhere. This high initial ribosome density helped motivate the paper of Tuller et al., though their finding of “encoded slow codons” could explain only a very small part of the increased ribosome density. More modern ribosome profiling studies do not use cycloheximide as the first step in arresting translation, and in these studies, the density of ribosomes near the 5’ end of genes is greatly reduced. And yet, there remains, even in the absence of cycloheximide at the first step, a significantly increased density of ribosomes near the 5’ end (e.g., Weinberg et al., 2016). (However, most or all of these studies do use cycloheximide at a later step in the protocol, and the possibility of a cycloheximide artefact is difficult to exclude.) Some of the reviewer’s concerns are that we do not explain the increased 5’ ribosome density seen by ribosome profiling. We agree; but we feel it is not the main point of our manuscript. In revision, we more extensively discuss other work on increased ribosome density, and more explicitly point out the limitations of our manuscript in this regard. We also note, though, that increased ribosome density is not a direct measure of translation speed—it can have other causes.

      Specific Responses.

      Reviewer 1 was concerned that we did not more fully discuss other work on possible reasons for slow initial translation. We discuss such work more extensively in our revision. However, as far as we know, none of this work proposes a reason for the 5’ enrichment of rare, slow codons, and this is the main point of our paper. Furthermore, it is not completely clear that there is any slow initial translation. The increase in ribosome density seen in flash-freeze ribosome profiling could be an artefact of the use of cycloheximide at the thaw step of the protocols; or it could be a real measure of high ribosome density that occurs for some other reason than slow translation (e.g., ribosomes might have low processivity at the 5’ end).

      Reviewer 1 was also concerned about confounding effects in our reporter gene analysis of the effects of different codons on efficiency of translation. We have two comments. First, it is important to remember that although we changed codons in our reporters, we did not change any amino acids. We changed codons only to synonymous codons. Thus at least one of the reviewer’s possible confounding effects—interactions of the nascent peptide chain with the exit channel of the ribosome—does not apply. However, of course, the mRNA nucleotide sequence is altered, and this would cause a change in mRNA structure or abundance, which could matter. We agree this is a limitation to our approach. However, to fully address it, we feel it would be necessary to examine a really large number of quite different sequences, which is beyond the scope of this work. Furthermore, mRNAs with low secondary structure at the 5’ end probably have relatively high rates of initiation, and also relatively high rates of elongation, and it might be quite difficult to disentangle these. But in neither case is there an argument that slow initial translation is efficient. Accurate measurement of mRNA levels would be helpful, but would not disentangle rates of initiation from rates of elongation as causes of changes in expression.

      Reviewer 2 was concerned that the conservation scores for the 5’ 40 amino acids, and the 3’ 40 amino acids were similar, but slow translation was only statistically significant for the 5’ 40 amino acids. As we say in the manuscript, we are also puzzled by this. We note that 3’ translation is statistically slow, if one looks over the last 100 amino acids. Our best effort at an explanation is a sort of reverse-Tuller explanation: that in the last 40 amino acids, the new slow codons created by genome plasticity are fairly quickly removed by purifying selection, but that in the first 40 amino acids, for genes that need to be expressed at low levels, purifying selection against slow codons is reduced, because poor translation is actually advantageous for these genes. To expand on this a bit, we feel that the 5000 or so proteins of the proteome have to be expressed in the correct stoichiometric ratios, and that poor translation can be a useful tool to help achieve this. In this explanation, slow translation at the 5’ end is bad for translation (in agreement with our reporter experiments), but can be good for the organism, when it occurs in front of a gene that needs to be expressed poorly. Whereas, in Tuller, slow translation at the 5’ end is good for translation.

      Reviewer 2 wondered whether the N-terminal fusion peptide affects GFP fluorescence in our reporter. This specific reporter, with this N-terminus, has been characterized by Dean and Grayhack (2012), and by Gamble et al. (2016), and the idea that a super-folder GFP reporter is not greatly affected by N-terminal fusions is based on the work of Pedelacq (2006). None of these papers show whether this N-terminal fusion might have some effect, but together, they provide good reason to think that any effect would be small. These citations have been added.

    1. Author response:

      Reviewer #1 (Public Review):

      Abbasi et al. assess in this MEG study the directed connectivity of both cortical and subcortical regions during continuous speech production and perception. The authors observed bidirectional connectivity patterns between speech-related cortical areas as well as subcortical areas in production and perception. Interestingly, they found in speaking low-frequency connectivity from subcortical (the right cerebellum) to cortical (left superior temporal) areas, while connectivity from the cortical to subcortical areas was in the high frequencies. In listening a similar cortico-subcortical connectivity pattern was observed for the low frequencies, but the reversed connectivity in the higher frequencies was absent.

      The work by Abbasi and colleagues addresses a relevant, novel topic, namely understanding the brain dynamics between speaking and listening. This is important because traditionally production and perception of speech and language are investigated in a modality-specific manner. To have a more complete understanding of the neurobiology underlying these different speech behaviors, it is key to also understand their similarities and differences. Furthermore, to do so, the authors utilize state-of-the-art directed connectivity analyses on MEG measurements, providing a quite detailed profile of cortical and subcortical interactions for the production and perception of speech. Importantly, and perhaps most interesting in my opinion, is that the authors find evidence for frequency-specific directed connectivity, which is (partially) different between speaking and listening. This could suggest that both speech behaviors rely (to some extent) on similar cortico-cortical and cortico-subcortical networks, but different frequency-specific dynamics.

      These elements mentioned above (investigation of both production and perception, both cortico-cortical and cortico-subcortical connectivity is considered, and observing frequency-specific connectivity profiles within and between speech behaviors), make for important novel contributions to the field. Notwithstanding these strengths, I find that they are especially centered on methodology and functional anatomical description, but that precise theoretical contributions for neurobiological and cognitive models of speech are less transparent. This is in part because the study compares speech production and perception in general, but no psychophysical or psycholinguistic manipulations are considered. I also have some critical questions about the design which may pose some confounds in interpreting the data, especially with regard to comparing production and perception.

      (1) While the cortico-cortical and cortico-subcortical connectivity profiles highlighted in this study and the depth of the analyses are impressive, what these data mean for models of speech processing remains on the surface. This is in part due, I believe, to the fact that the authors have decided to explore speaking and listening in general, without targeting specific manipulations that help elucidate which aspects of speech processing are relevant for the particular connectivity profiles they have uncovered. For example, the frequency-specific directed connectivity is it driven by low-level psychophysical attributes of the speech or by more cognitive linguistic properties? Does it relate to the monitoring of speech, timing information, and updating of sensory predictions? Without manipulations trying to target one or several of these components, as some of the referenced work has done (e.g., Floegel et al., 2020; Stockert et al., 2021; Todorović et al., 2023), it is difficult to draw concrete conclusions as to which representations and/or processes of speech are reflected by the connectivity profiles. An additional disadvantage of not having manipulations within each speech behavior is that it makes the comparison between listening and speaking harder. That is, speaking and listening have marked input-output differences which likely will dominate any comparison between them. These physically driven differences (or similarities for that matter; see below) can be strongly reduced by instead exploring the same manipulations/variables between speaking and listening. If possible (if not to consider for future work), it may be interesting to score psychophysical (e.g., acoustic properties) or psycholinguistic (e.g., lexical frequency) information of the speech and see whether and how the frequency-specific connectivity profiles are affected by it.

      We thank the reviewer for pointing this out. The current study is indeed part of a larger project investigating the role of the internal forward model in speech perception and production. In the original, more comprehensive study, we also included a masked condition where participants produced speech as usual, but their auditory perception was masked. This allowed us to examine how the internal forward model behaves when it doesn't receive the expected sensory consequences of generated speech. However, for the current study, we focused solely on data from the speaking and listening conditions due to its specific research question. We agree that further manipulations would be interesting. However, for this study our focus was on natural speech and we avoided other manipulations (beyond masked speech) so that we can have sufficiently long recording time for the main speaking and listening conditions.

      (2) Recent studies comparing the production and perception of language may be relevant to the current study and add some theoretical weight since their data and interpretations for the comparisons between production and perception fit quite well with the observations in the current work. These studies highlight that language processes between production and perception, specifically lexical and phonetic processing (Fairs et al., 2021), and syntactic processing (Giglio et al., 2024), may rely on the same neural representations, but are differentiated in their (temporal) dynamics upon those shared representations. This is relevant because it dispenses with the classical notion in neurobiological models of language where production and perception rely on (partially) dissociable networks (e.g., Price, 2010). Rather those data suggest shared networks where different language behaviors are dissociated in their dynamics. The speech results in this study nicely fit and extend those studies and their theoretical implications.

      We thank the reviewer for the suggestion and we will include these references and the points made by the reviewer in our revised manuscript.

      (3) The authors align the frequency-selective connectivity between the right cerebellum and left temporal speech areas with recent studies demonstrating a role for the right cerebellum for the internal modelling in speech production and monitoring (e.g., Stockert et al., 2021; Todorović et al., 2023). This link is indeed interesting, but it does seem relevant to point out that at a more specific scale, it does not concern the exact same regions between those studies and the current study. That is, in the current study the frequency-specific connectivity with temporal regions concerns lobule VI in the right cerebellum, while in the referenced work it concerns Crus I/II. The distinction seems relevant since Crus I/II has been linked to the internal modelling of more cognitive behavior, while lobule VI seems more motor-related and/or contextual-related (e.g., D'Mello et al., 2020; Runnqvist et al., 2021; Runnqvist, 2023).

      We thank the reviewer for their insightful comment. The reference was intended to provide evidence for the role of the cerebellum in internal modelling in speech. We do not claim that we have the spatial resolution with MEG to reliably spatially resolve specific parts of the cerebellum.

      (4) On the methodological side, my main concern is that for the listening condition, the authors have chosen to play back the speech produced by the participants in the production condition. Both the fixed order as well as hearing one's own speech as listening condition may produce confounds in data interpretation, especially with regard to the comparison between speech production and perception. Could order effects impact the observed connectivity profiles, and how would this impact the comparison between speaking and listening? In particular, I am thinking of repetition effects present in the listening condition as well as prediction, which will be much more elevated for the listening condition than the speaking condition. The fact that it also concerns their own voice furthermore adds to the possible predictability confound (e.g., Heinks-Maldonado et al., 2005). In addition, listening to one's speech which just before has been articulated may, potentially strategically even, enhance inner speech and "mouthing" in the participants, hereby thus engaging the production mechanism. Similarly, during production, the participants already hear their own voice (which serves as input in the subsequent listening condition). Taken together, both similarities or differences between speaking and listening connectivity may have been due to or influenced by these order effects, and the fact that the different speech behaviors are to some extent present in both conditions.

      This is a valid point raised by the reviewer. By listening to their own previously produced speech, our participants might have anticipated and predicted the sentences easier. However, during designing our experiment, we tried to lower the chance of this anticipation by several steps. First, participants were measured in separate sessions for speech production and perception tasks. There were always several days' intervals between performing these two conditions. Secondly, our questions were mainly about a common/general topic. Consequently, participants may not remember their answers completely.

      Importantly, using the same stimulus material for speaking and listening guaranteed that there was no difference in the low-level features of the material for both conditions that could have affected the results of our statistical comparison.

      Due to bone conduction, hearing one’s unaltered own speech from a recording may seem foreign and could lead to unwanted emotional reactions e.g. embarrassment, so participants were asked whether they heard their own voice in a recording already (e.g. from a self-recorded voice-message in WhatsApp) which most of them confirmed. Participants were also informed that they were going to hear themselves during the measurement to further reduce unwanted psychophysiological responses.

      (5) The ability of the authors to analyze the spatiotemporal dynamics during continuous speech is a potentially important feat of this study, given that one of the reasons that speech production is much less investigated compared to perception concerns motor and movement artifacts due to articulation (e.g., Strijkers et al., 2010). Two questions did spring to mind when reading the authors' articulation artifact correction procedure: If I understood correctly, the approach comes from Abbasi et al. (2021) and is based on signal space projection (SSP) as used for eye movement corrections, which the authors successfully applied to speech production. However, in that study, it concerned the repeated production of three syllables, while here it concerns continuous speech of full words embedded in discourse. The articulation and muscular variance will be much higher in the current study compared to three syllables (or compared to eye movements which produce much more stable movement potentials compared to an entire discourse). Given this, I can imagine that corrections of the signal in the speaking condition were likely substantial and one may wonder (1) how much signal relevant to speech production behavior is lost?; (2) similar corrections are not necessary for perception, so how would this marked difference in signal processing affect the comparability between the modalities?

      One of the results of our previous study (Abbasi et al., 2021) was that the artefact correction was not specific to individual syllables but generalised across syllables. Also, the repeated production of syllables was associated with substantial movements of the articulators mimicking those observed during naturalistic speaking. We therefore believe that the artefact rejection is effective during speaking. We also checked this by investigating speech related coherence in brain parcels in spatial proximity to the articulators. In our previous study we also show that the correction method retains neural activity to a very large degree. We are therefore confident that speaking and listening conditions can be compared and that the loss of true signals from correcting the speaking data will be minor.

      References:

      • Abbasi, O., Steingräber, N., & Gross, J. (2021). Correcting MEG artifacts caused by overt speech. Frontiers in Neuroscience, 15, 682419.

      • D'Mello, A. M., Gabrieli, J. D., & Nee, D. E. (2020). Evidence for hierarchical cognitive control in the human cerebellum. Current Biology, 30(10), 1881-1892.

      • Fairs, A., Michelas, A., Dufour, S., & Strijkers, K. (2021). The same ultra-rapid parallel brain dynamics underpin the production and perception of speech. Cerebral Cortex Communications, 2(3), tgab040.

      • Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      • Giglio, L., Ostarek, M., Sharoh, D., & Hagoort, P. (2024). Diverging neural dynamics for syntactic structure building in naturalistic speaking and listening. Proceedings of the National Academy of Sciences, 121(11), e2310766121.

      • Heinks‐Maldonado, T. H., Mathalon, D. H., Gray, M., & Ford, J. M. (2005). Fine‐tuning of auditory cortex during speech production. Psychophysiology, 42(2), 180-190.

      • Price, C. J. (2010). The anatomy of language: a review of 100 fMRI studies published in 2009. Annals of the new York Academy of Sciences, 1191(1), 62-88.

      • Runnqvist, E., Chanoine, V., Strijkers, K., Pattamadilok, C., Bonnard, M., Nazarian, B., ... & Alario, F. X. (2021). Cerebellar and cortical correlates of internal and external speech error monitoring. Cerebral Cortex Communications, 2(2), tgab038.

      • Runnqvist, E. (2023). Self-monitoring: The neurocognitive basis of error monitoring in language production. In Language production (pp. 168-190). Routledge.

      • Stockert, A., Schwartze, M., Poeppel, D., Anwander, A., & Kotz, S. A. (2021). Temporo-cerebellar connectivity underlies timing constraints in audition. Elife, 10, e67303.

      • Strijkers, K., Costa, A., & Thierry, G. (2010). Tracking lexical access in speech production: electrophysiological correlates of word frequency and cognate effects. Cerebral cortex, 20(4), 912-928.

      • Todorović, S., Anton, J. L., Sein, J., Nazarian, B., Chanoine, V., Rauchbauer, B., ... & Runnqvist, E. (2023). Cortico-cerebellar monitoring of speech sequence production. Neurobiology of Language, 1-21.

      Reviewer #2 (Public Review):

      Summary:

      The authors re-analyse MEG data from a speech production and perception study and extend their previous Granger causality analysis to a larger number of cortical-cortical and in particular cortical-subcortical connections. Regions of interest were defined by means of a meta-analysis using Neurosynth.org and connectivity patterns were determined by calculating directed influence asymmetry indices from the Granger causality analysis results for each pair of brain regions. Abbasi et al. report feedforward signals communicated via fast rhythms and feedback signals via slow rhythms below 40 Hz, particularly during speaking. The authors highlight one of these connections between the right cerebellum lobule VI and auditory association area A5, where in addition the connection strength correlates negatively with the strength of speech tracking in the theta band during speaking (significant before multiple comparison correction). Results are interpreted within a framework of active inference by minimising prediction errors.

      While I find investigating the role of cortical-subcortical connections in speech production and perception interesting and relevant to the field, I am not yet convinced that the methods employed are fully suitable to this endeavour or that the results provide sufficient evidence to make the strong claim of dissociation of bottom-up and top-down information flow during speaking in distinct frequency bands.

      Strengths:

      The investigation of electrophysiological cortical-subcortical connections in speech production and perception is interesting and relevant to the field. The authors analyse a valuable dataset, where they spent a considerable amount of effort to correct for speech production-related artefacts. Overall, the manuscript is well-written and clearly structured.

      Weaknesses:

      The description of the multivariate Granger causality analysis did not allow me to fully grasp how the analysis was performed and I hence struggled to evaluate its appropriateness. Knowing that (1) filtered Granger causality is prone to false positives and (2) recent work demonstrates that significant Granger causality can simply arise from frequency-specific activity being present in the source but not the target area without functional relevance for communication (Schneider et al. 2021) raises doubts about the validity of the results, in particular with respect to their frequency specificity. These doubts are reinforced by what I perceive as an overemphasis on results that support the assumption of specific frequencies for feedforward and top-down connections, while findings not aligning with this hypothesis appear to be underreported. Furthermore, the authors report some main findings that I found difficult to reconcile with the data presented in the figures. Overall, I feel the conclusions with respect to frequency-specific bottom-up and top-down information flow need to be moderated and that some of the reported findings need to be checked and if necessary corrected.

      Major points

      (1) I think more details on the multivariate GC approach are needed. I found the reference to Schaum et al., 2021 not sufficient to understand what has been done in this paper. Some questions that remained for me are:

      (i) Does multivariate here refer to the use of the authors' three components per parcel or to the conditioning on the remaining twelve sources? I think the latter is implied when citing Schaum et al., but I'm not sure this is what was done here?

      If it was not: how can we account for spurious results based on indirect effects?

      Yes, multivariate refers to the three components.

      (ii) Did the authors check whether the GC of the course-target pairs was reliably above the bias level (as Schaum et. al. did for each condition separately)? If not, can they argue why they think that their results would still be valid? Does it make sense to compute DAIs on connections that were below the bias level? Should the data be re-analysed to take this concern into account?

      We performed statistics on DAI and believe that this is a valid approach. We argue that random GC effects would not survive our cluster-corrected statistics.

      (iii) You may consider citing the paper that introduced the non-parametric GC analysis (which Schaum et al. then went on to apply): Dhamala M, Rangarajan G, Ding M. Analyzing Information Flow in Brain Networks with Nonparametric Granger Causality. Neuroimage. 2008; 41(2):354-362. https://doi.org/10.1016/j.neuroimage.2008.02. 020

      Thanks, we will add this reference in the revised version.

      (2) GC has been discouraged for filtered data as it gives rise to false positives due to phase distortions and the ineffectiveness of filtering in the information-theoretic setting as reducing the power of a signal does not reduce the information contained in it (Florin et al., 2010; Barnett and Seth, 2011; Weber et al. 2017; Pinzuti et al., 2020 - who also suggest an approach that would circumvent those filter-related issues). With this in mind, I am wondering whether the strong frequency-specific claims in this work still hold.

      This must be a misunderstanding. We are aware of the problem with GC on filtered data. But GC was here computed on broadband data and not in individual frequency bands.

      (3) I found it difficult to reconcile some statements in the manuscript with the data presented in the figures:

      (i) Most notably, the considerable number of feedforward connections from A5 and STS that project to areas further up the hierarchy at slower rhythms (e.g. L-A5 to R-PEF, R-Crus2, L CB6 L-Tha, L-FOP and L-STS to R-PEF, L-FOP, L-TOPJ or R-A5 as well as R-STS both to R-Crus2, L-CB6, L-Th) contradict the authors' main message that 'feedback signals were communicated via slow rhythms below 40 Hz, whereas feedforward signals were communicated via faster rhythms'. I struggled to recognise a principled approach that determined which connections were highlighted and reported and which ones were not.

      (ii) "Our analysis also revealed robust connectivity between the right cerebellum and the left parietal cortex, evident in both speaking and listening conditions, with stronger connectivity observed during speaking. Notably, Figure 4 depicts a prominent frequency peak in the alpha band, illustrating the specific frequency range through which information flows from the cerebellum to the parietal areas." There are two peaks discernible in Figure 4, one notably lower than the alpha band (rather theta or even delta), the other at around 30 Hz. Nevertheless, the authors report and discuss a peak in the alpha band.

      (iii) In the abstract: "Notably, high-frequency connectivity was absent during the listening condition." and p.9 "In contrast with what we reported for the speaking condition, during listening, there is only a significant connectivity in low frequency to the left temporal area but not a reverse connection in the high frequencies."

      While Fig. 4 shows significant connectivity from R-CB6 to A5 in the gamma frequency range for the speaking, but not for the listening condition, interpreting comparisons between two effects without directly comparing them is a common statistical mistake (Makin and Orban de Xivry). The spectrally-resolved connectivity in the two conditions actually look remarkably similar and I would thus refrain from highlighting this statement and indicate clearly that there were no significant differences between the two conditions.

      (iv) "This result indicates that in low frequencies, the sensory-motor area and cerebellum predominantly transmit information, while in higher frequencies, they are more involved in receiving it."

      I don't think that this statement holds in its generality: L-CB6 and R-3b both show strong output at high frequencies, particularly in the speaking condition. While they seem to transmit information mainly to areas outside A5 and STS these effects are strong and should be discussed.

      We appreciate the reviewer's thoughtful comments. We acknowledge that not all connectivity patterns strictly adhere to the initial observation regarding feedback and feedforward communication. It's true that our primary focus was on interactions between brain regions known to be crucial for speech prediction, including auditory, somatosensory, and cerebellar areas. However, we also presented connectivity patterns across other regions to provide a more comprehensive picture of the speech network. We believe this broader perspective can be valuable for future research directions.

      Regarding the reviewer's observation about the alpha band peak in Figure 4, we agree that a closer examination reveals the connectivity from right cerebellum to the left parietal is in a wider low frequency range. We will refrain from solely emphasizing the alpha band and acknowledge the potential contribution of lower frequencies to cerebellar-parietal communication.

      We also appreciate the reviewer highlighting the need for a more nuanced interpretation of the listening condition connectivity compared to the speaking condition. The reviewer is correct in pointing out that while Figure 4 suggests a high-frequency connectivity from L-A5 to R-CB only in the speaking condition, a direct statistical comparison between conditions might not reveal a significant difference. We will revise the manuscript to clarify this point.

      Finally, a closer examination of Figure 3 revealed that the light purple and dark green edges in the speaking condition for R-CB6 and L-3b suggest outgoing connections at low frequencies, while other colored edges indicate information reception at high frequencies. We acknowledge that exceptions to this directional pattern might exist and warrant further investigation in future studies.

      (4) "However, definitive conclusions should be drawn with caution given recent studies raising concerns about the notion that top-down and bottom-up signals can only be transmitted via separate frequency channels (Ferro et al., 2021; Schneider et al., 2021; Vinck et al., 2023)."

      I appreciate this note of caution and think it would be useful if it were spelled out to the reader why this is the case so that they would be better able to grasp the main concerns here. For example, Schneider et al. make a strong point that we expect to find Granger-causality with a peak in a specific frequency band for areas that are anatomically connected when the sending area shows stronger activity in that band than the receiving one, simply because of the coherence of a signal with its own linear projection onto the other area. The direction of a Granger causal connection would in that case only indicate that one area shows stronger activity than the other in the given frequency band. I am wondering to what degree the reported connectivity pattern can be traced back to regional differences in frequency-specific source strength or to differences in source strength across the two conditions.

      This is indeed an important point. That is why we are discussing our results with great caution and specifically point the reader to the relevant literature. We are indeed thinking about a future study where we investigate this connectivity using other connectivity metrics and a detailed consideration of power.

      Reviewer #3 (Public Review):

      In the current paper, Abbasi et al. aimed to characterize and compare the patterns of functional connectivity across frequency bands (1 Hz - 90 Hz) between regions of a speech network derived from an online meta-analysis tool (Neurosynth.org) during speech production and perception. The authors present evidence for complex neural dynamics from which they highlight directional connectivity from the right cerebellum to left superior temporal areas in lower frequency bands (up to beta) and between the same regions in the opposite direction in the (lower) high gamma range (60-90 Hz). Abbasi et al. interpret their findings within the predictive coding framework, with the cerebellum and other "higher-order" (motor) regions transmitting top-down sensory predictions to "lower-order" (sensory) regions in the lower frequencies and prediction errors flowing in the opposite direction (i.e., bottom-up) from those sensory regions in the gamma band. They also report a negative correlation between the strength of this top-down functional connectivity and the alignment of superior temporal regions to the syllable rate of one's speech.

      Strengths:

      (1) The comprehensive characterization of functional connectivity during speaking and listening to speech may be valuable as a first step toward understanding the neural dynamics involved.

      (2) The inclusion of subcortical regions and connectivity profiles up to 90Hz using MEG is interesting and relatively novel.

      (3) The analysis pipeline is generally adequate for the exploratory nature of the work.

      Weaknesses:

      (1) The work is framed as a test of the predictive coding theory as it applies to speech production and perception, but the methodological approach is not suited to this endeavor.

      We agree that we cannot provide definite evidence for predictive coding in speech production and perception and we believe that we do not make that claim in the manuscript. However, our results are largely consistent with what can be expected based on predictive coding theory.

      (2) Because of their theoretical framework, the authors readily attribute roles or hierarchy to brain regions (e.g., higher- vs lower-order) and cognitive functions to observed connectivity patterns (e.g., feedforward vs feedback, predictions vs prediction errors) that cannot be determined from the data. Thus, many of the authors' claims are unsupported.

      We will revise the manuscript to more clearly differentiate our results (e.g. directed Granger-Causality from A to B) from their interpretation (potentially indicating feedforward or feedback signals).

      (3) The authors' theoretical stance seems to influence the presentation of the results, which may inadvertently misrepresent the (otherwise perfectly valid; cf. Abbasi et al., 2023) exploratory nature of the study. Thus, results about specific regions are often highlighted in figures (e.g., Figure 2 top row) and text without clear reasons.

      Our connectograms reveal a multitude of results that we hope is interesting to the community. At the same time the wealth of findings poses a problem for describing them. We did not see a better way then to highlight specific connections of interest.

      (4) Some of the key findings (e.g., connectivity in opposite directions in distinct frequency bands) feature in a previous publication and are, therefore, interesting but not novel.

      We actually see this as a strength of the current manuscript. The computation of connectivity is here extended to a much larger sample of brain areas. It is reassuring to see that the previously reported results generalise to other brain areas.

      (5) The quantitative comparison between speech production and perception is interesting but insufficiently motivated.

      We thank the reviewer for this comment. We have addressed that in detail in response to the point (1&4) of reviewer 1.

      (6) Details about the Neurosynth meta-analysis and subsequent selection of brain regions for the functional connectivity analyses are incomplete. Moreover, the use of the term 'Speech' in Neurosynth seems inappropriate (i.e., includes irrelevant works, yielding questionable results). The approach of using separate meta-analyses for 'Speech production' and 'Speech perception' taken by Abbasi et al. (2023) seems more principled. This approach would result, for example, in the inclusion of brain areas such as M1 and the BG that are relevant for speech production.

      We agree that there are inherent limitations in automated meta-analysis tools such as Neurosynth. Papers are used in the meta-analysis that might not be directly relevant. However, Neurosynth has proven its usefulness over many years and has been used in many studies. We also agree that our selection of brain areas is not complete. But Granger Causality analysis of every pair of ROIs leads to complex results and we had to limit our selection of areas.

      (7) The results involving subcortical regions are central to the paper, but no steps are taken to address the challenges involved in the analysis of subcortical activity using MEG. Additional methodological detail and analyses would be required to make these results more compelling. For example, it would be important to know what the coverage of the MEG system is, what head model was used for the source localization of cerebellar activity, and if specific preprocessing or additional analyses were performed to ensure that the localized subcortical activity (in particular) is valid.

      There is a large body of evidence demonstrating that MEG can record signals from deep brain areas such as thalamus and cerebellum including Attal & Schwarz 2013, Andersen et al, Neuroimage 2020; Piastra et al., 2020; Schnitzler et al., 2009. These and other studies provide evidence that state-of-the-art recording (with multichannel SQUID systems) and analysis is sufficient to allow reconstruction of subcortical areas. However, spatial resolution is clearly reduced for these deep areas. We will add a statement in the revised manuscript to acknowledge this limitation.

      (8) The results and methods are often detailed with important omissions (a speech-brain coupling analysis section is missing) and imprecisions (e.g., re: Figure 5; the Connectivity Analysis section is copy-pasted from their previous work), which makes it difficult to understand what is being examined and how. (It is also not good practice to refer the reader to previous publications for basic methodological details, for example, about the experimental paradigm and key analyses.) Conversely, some methodological details are given, e.g., the acquisition of EMG data, without further explanation of how those data were used in the current paper.

      We will revise the relevant sections of the manuscript.

      (9) The examination of gamma functional connectivity in the 60 - 90 Hz range could be better motivated. Although some citations involving short-range connectivity in these frequencies are given (e.g., within the visual system), a more compelling argument for looking at this frequency range for longer-range connectivity may be required.

      Given previous evidence of connectivity in the gamma band we think that it would be a weakness to exclude this frequency band from analysis.

      (10) The choice of source localization method (linearly constrained minimum variance) could be explained, particularly given that other methods (e.g. dynamic imaging of coherent sources) were specifically designed and might potentially be a better alternative for the types of analyses performed in the study.

      Both LCMV and DICS are beamforming methods. We used LCMV because we wanted used Granger Causality which requires broadband signals. DICS would only provide frequency-specific band-limited signals.

      (11) The mGC analysis needs to be more comprehensively detailed for the reader to be able to assess what is being reported and the strength of the evidence. Relatedly, first-level statistics (e.g., via estimation of the noise level) would make the mGC and DAI results more compelling.

      We perform group-level cluster-based statistics on mGC while correcting for multiple comparisons across frequency bands and brain parcels and report only significant results. This is an established approach that is routinely used in this type of studies.

      (12) Considering the exploratory nature of the study, it is essential for other researchers to continue investigating and validating the results presented in the current manuscript. Thus, it is concerning that data and scripts are not fully and openly available. Data need not be in its raw state to be shared and useful, which circumvents the stated data privacy concerns.

      We acknowledge the reviewer's concern regarding the full availability of the dataset. Due to privacy limitations on the collected data, we are unable to share it publicly at this time. However, to promote transparency and enable further exploration, we have provided the script used for data analysis and an example dataset. This example dataset should provide a clear understanding of the data structure and variables used in the analysis. Additionally, we are happy to share the complete dataset upon request from research teams interested in performing in-depth secondary analyses.

    1. Author response:

      We would like to thank all reviewers for their time, critical evaluation, recognition, and constructive comments of the manuscript. We will revise the manuscript accordingly. Below are our point-to-point response to the comments.

      From Reviewer #1:

      “…several previous studies have identified co-expression of vomeronasal receptors by vomeronasal sensory neurons, and the expression of non-vomeronasal receptors, and this was not adequately addressed in the manuscript as presented.”

      We plan to add context and citations to the Introduction and Results sections relating to recent studies on the co-expression of vomeronasal receptors and the expression of non-vomeronasal receptors in VSNs.

      “The data resulting from the use of the Resolve Biosciences spatial transcriptomics platform are somewhat difficult to interpret, and the methods are somewhat opaque.”

      Unfortunately, detailed Molecular Cartography protocols remain proprietary at Resolve Biosciences and were not disclosed. We acknowledge this limitation. Our role in the acquisition and processing of data for this experiment is included in the current Methods section. We will clarify this in the revised manuscript. Additional figures produced by the Molecular Cartography analysis will also be added (See response to Reviewer #2, below) to the supplemental materials to help clarify interpretation of the results.

      From Reviewer #2:

      “…the authors present a biased report of previously published work, largely including only those results that do not overlap with their own findings, but ignoring results that would question the novelty of the data presented here.”

      We had no intention of misleading the readers. In fact, we have discussed discrepancies between our results with other studies. However, we inadvertently left out a critical publication in preparing the manuscript. We plan to add context and citations (where missing) relating to recent studies that use single cell RNA sequencing in the vomeronasal organ, studies relating to the co-expression of vomeronasal receptors, and studies discussing V1R/V2R lineage determination.

      “Did the authors perform any cell selectivity, or any directed dissection, to obtain mainly neuronal cells? Previous studies reported a greater proportion of non-neuronal cells. For example, while Katreddi and co-workers (ref 89) found that the most populated clusters are identified as basal cells, macrophages, pericytes, and vascular smooth muscle, Hills Jr. et al. in this work did not report such types of cells. Did the authors check for the expression of marker genes listed in Ref 89 for such cell types?”

      For VNO dissections, we removed bones and blood vessels from VNO tissue and only kept the sensory epithelium. This procedure removed vascular smooth muscle cells, pericytes, and other non-neuronal cell types, which explains differences in cell proportions between out study and previous studies. We used a DAPI/Draq5 assay to sort live/nucleated cells for sequencing and no specific markers were used for cell selection. All cells in the experiment were successfully annotated using the cell-type markers shown in Fig. 1B, save for cells from the sVSN cluster, which were novel, and required further analysis to characterize.

      “The authors should report the marker genes used for cell annotation.”

      Marker genes used for cell annotation are shown in figure 1B. A full list of all marker genes used in the cell annotation process will be provided.

      “The authors reported no differences between juvenile and adult samples, and between male and female samples. It is not clear how they evaluate statistically significant differences, which statistical test was used, or what parameters were evaluated.”

      The claims made about male/female mice and P14/P56 mice directly pertain to the distribution of clusters and cells in UMAP space as seen in Figure 1 C & D. We have indeed performed differential gene expression analysis for male/female and P14/P56 comparisons using the FindMarkers function from the Seurat R package. Although we have found significant differential expression between male and female, and between P14 and P56 animals, the genes in this list do not appear to be influential for the neuronal lineage and cell type specification or related to cell adhesion molecules, which are the main focuses of this study. Nevertheless, we plan to add these results to the supplemental materials in a revised manuscript.

      “‘Based on our transcriptomic analysis, we conclude that neurogenic activity is restricted to the marginal zone.’ This conclusion is quite a strong statement, given that this study was not directed to carefully study neurogenesis distribution, and when neurogenesis in the basal zone has been proposed by other works, as stated by the authors.”

      Eighteen slides from whole VNO sections were used in Molecular Cartography analysis, while one representative slide was used to present findings. Across all slides, GBCs, INPs, and iVSNs show a pattern of proximity to the marginal zone (MZ), with GBCs presenting nearest to the MZ and iVSNs presented furthest. We believe that the full scope of our results justifies our claim that neurogenesis is restricted to the MZ. This claim is also supported by the 2021 study by Katreddi & Forni. We will provide additional figures to further support this claim.

      “The authors report at least two new types of sensory neurons in the mouse VNO, a finding of huge importance that could have a substantial impact on the field of sensory physiology. However, the evidence for such new cell types is based solely on this transcriptomic dataset and, as such, is quite weak, since many crucial morphological and physiological aspects would be missing to clearly identify them as novel cell types. As stated before, many control and confirmatory experiments, and a careful evaluation of the results presented in this work must be performed to confirm such a novel and interesting discovery. The reported "novel classes of sensory neurons" in this work could represent previously undescribed types of sensory neurons, but also previously reported cells (see below) or simply possible single-cell sequencing artefacts.”

      The reviewer is correct that detailed morphological and physiological studies are needed to further understand these cells. This is an opinion we share. Our paper is primarily intended as a resource paper to provide access to a large-scale single-cell RNA-sequenced dataset and discoveries based on the transcriptomic data that can support and inspire ongoing and future experiments in the field. Nonetheless, we are confident that neither of the novel cell clusters are the result of sequencing artefacts. We performed a robust quality-control protocol, including count correction for ambient RNA with the R package, SoupX, multiplet cell detection and removal with the Python module, Scrublet, and a strict 5% mitochondrial gene expression cut-off. Furthermore, the cell clusters in question show no signs of being the result of sequencing artefacts, as they are physically connected in a reasonable orientation to the rest of the neuronal lineage in modular clusters in 2D and 3D UMAP space. The OSN and sVSN (S1H) cell clusters each show distinct and self-consistent expressions of genes. Gene ontology (GO) analysis reveals significant GO term enrichment for both the sVSN (Fig. 2G) and mOSN clusters when compared to mature V1R and V2R VSNs, indicating functional differences. Additional figures for mOSN differential gene expression and gene ontology analysis results will be added to the supplemental figures.

      “The authors report the co-expression of V2R and Gnai2 transcripts based on sequencing data. That could dramatically change classical classifications of basal and apical VSNs. However, did the authors find support for this co-expression in spatial molecular imaging experiments?” 

      Genes with extremely high expression levels overwhelm signals from other genes, and therefore had to be removed from the experiment. This is a limitation of the Molecular Cartography platform. Unfortunately, Gnai2 was determined to be one of these genes and was not evaluated for this purpose.

      “Canonical OSNs: The authors report a cluster of cells expressing neuronal markers and ORs and call them canonical OSN. However, VSNs expressing ORs have already been reported in a detailed study showing their morphology and location inside the sensory epithelium (References 82, 83). Such cells are not canonical OSNs since they do not show ciliary processes, they express TRPC2 channels and do not express Golf. Are the "canonical OSNs" reported in this study and the OR-expressing VSNs (ref 82, 83) different? Which parameters, other than Gnal and Cnga2 expression, support the authors' bold claim that these are "canonical OSNs"? What is the morphology of these neurons? In addition, the mapping of these "canonical OSNs" shown in Figure 2D paints a picture of the negligible expression/role of these cells (see their prediction confidence).” 

      We observe OR expression in VSNs in our data; these cells cluster with VSNs. The putative mOSN cluster exhibits its own trajectory, distinct from VSN clusters. These cells express Gnal (Golf), which is not expressed in VSNs expressing ORs, nor in any other cell-type in the data. After performing differential gene expression on the putative mOSN cluster, comparing with V1R and V2R VSNs, independently, GO analysis returned the top significantly enriched GO molecular function, ‘olfactory receptor activity’, and the top significantly enriched cellular component, ‘cilium’. Because we were limited to list of 100 genes in Molecular Cartography probe panel, we have prioritized the detection of canonical VNO cell-types, vomeronasal receptor co-expression, and the putative sVSNs, and were not able to include a robust analysis of the putative OSNs.

      “Secretory VSN: The authors report another novel type of sensory neurons in the VNO and call them "secretory VSNs". Here, the authors performed an analysis of differentially expressed genes for neuronal cells (dataset 2) and found several differentially expressed genes in the sVSN cluster. However, it would be interesting to perform a gene expression analysis using the whole dataset including neuronal and non-neuronal cells. Could the authors find any marker gene that unequivocally identifies this new cell type?”

      We did not find unequivocal marker genes for sVSNs. We did perform differential analysis of the sVSN cluster with whole VNO data and with the neuronal subset, as well as against specific cell-types. We could not find a single gene that was perfectly exclusive to sVSNs. We used a combinatorial marker-gene approach to predicting sVSNs in the Molecular Cartography data. This required a larger subset of our 100 gene panel to be dedicated to genes for detecting sVSNs.

      “When the authors evaluated the distribution of sVSN using the Molecular Cartography technique, they found expression of sVSN in both sensory and non-sensory epithelia. How do the authors explain such unexpected expression of sensory neurons in the non-sensory epithelium?” 

      In our scRNA-Seq experiment, blood vessels were removed, limiting the power to distinguish between certain cell types. Because of the limited number of genes that we can probe using Molecular Cartography, the number of genes associated with sVSNs may be present in the non-sensory epithelium. This could lead to the identification of cells that may or may not be identical to the sVSNs in the non-neuronal epithelium. Indeed, further studies will need to be conducted to determine the specificity of these cells.

      “The low total genes count and low total reads count, combined with an "expression of marker genes for several cell types" could indicate low-quality beads (contamination) that were not excluded with the initial parameter setting. It looks like cells in this cluster express a bit of everything V1R, V2R, OR, secretory proteins...”

      We are confident that the putative sVSN cell cluster is not the result of low-quality cells. We performed a robust quality-control protocol, including count correction for ambient RNA with the R package, SoupX, multiplet cell detection and removal with the Python module, Scrublet, and a strict 5% mitochondrial gene expression cut-off. Furthermore, the cell clusters in question show no signs of being the result of sequencing artefacts, as they are connected in a reasonable orientation to the rest of the neuronal lineage in modular clusters in 2D and 3D UMAP space. The OSN and sVSN cell clusters each show distinct and self-consistent expressions of genes (Fig. S1H). Gene ontology (GO) analysis reveals significant GO term enrichment for both the sVSN (Fig. 2G) and mOSN clusters when compared to mature V1R and V2R VSNs, indicating functional differences. Moreover, while some genes were expressed at a lower level when compared to the canonical VSNs, others were expressed at higher levels, precluding the cause of discrepancy as resulting from an overall loss of gene counts.

      “The authors wrote ‘...the transcriptomic landscape that specifies the lineages is not known...’. This statement is not completely true, or at least misleading. There are still many undiscovered aspects of the transcriptomics landscape and lineage determination in VSNs. However, authors cannot ignore previously reported data showing the landscape of neuronal lineages in VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). Expression of most of the transcription factors reported by this study (Ascl1, Sox2, Neurog1, Neurod1...) were already reported, and for some of them, their role was investigated, during early developmental stages of VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). In summary, the authors should fully include the findings from previous works (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259), clearly state what has been already reported, what is contradictory and what is new when compared with the results from this work.“

      This is a difference in opinion about the terminology. Transcriptomic landscape in our paper refers to the genome-wide expression by individual cells, not just individual genes. The reviewer is correct that many of the genetic specifiers have been identified, which we cited and discussed. We consider these studies as providing a “genetic” underpinning, rather than the “transcriptomic landscape” in lineage progression. We will clarify this point in the revised manuscript. 

      “…the co-expression of specific V2Rs with specific transcription factors does not imply a direct implication in receptor selection. Directed experiments to evaluate the VR expression dependent on a specific transcription factor must be performed.” 

      The reviewer is correct, and we did not claim that the co-expression of specific transcription factors indicate a direct relationship with receptor selection. We agree that further directed experiments are required to investigate this question.

      “This study reports that transcription factors, such as Pou2f1, Atf5, Egr1, or c-Fos could be associated with receptor choice in VSNs. However, no further evidence is shown to support this interaction. Based on these purely correlative data, it is rather bold to propose cascade model(s) of lineage consolidation.”

      The reviewer is correct. As any transcriptomic study will only be correlative, additional studies will be needed to unequivocally determine the mechanistic link between the transcription factors with receptor choice. Our model provides a base for these studies.

      “The authors use spatial molecular imaging to evaluate the co-expression of many chemosensory receptors in single VNO cells. […] However, it is difficult to evaluate and interpret the results due to the lack of cell borders in spatial molecular imaging. The inclusion of cell border delimitation in the reported images (membrane-stained or computer-based) could be tremendously beneficial for the interpretation of the results.”

      The most common practice for cell segmentation of spatial transcriptomics data is to determine cell borders based on nuclear staining with expansion. We have tested multiple algorithms based on recent studies, but each has its own caveat. We will clarify this point in the revised manuscript.

      “It is surprising that the authors reported a new cell type expressing OR, however, they did not report the expression of ORs in Molecular Cartography technique. Did the authors evaluate the expression of OR using the cartography technique?” 

      We were limited to a 100-gene probe panel and only included one OR, the expression was not high enough for us to substantiate any claims.

      From Reviewer #3:

      “(1) The authors claim that they have identified two new classes of sensory neurons, one being a class of canonical olfactory sensory neurons (OSNs) within the VNO. This classification as canonical OSNs is based on expression data of neurons lacking the V1R or V2R markers but instead expressing ORs and signal transduction molecules, such as Gnal and Cnga2. Since OR-expressing neurons in the VNO have been previously described in many studies, it remains unclear to me why these OR-expressing cells are considered here a "new class of OSNs." Moreover, morphological features, including the presence of cilia, and functional data demonstrating the recognition of chemosignals by these neurons, are still lacking to classify these cells as OSNs akin to those present in the MOE. While these cells do express canonical markers of OSNs, they also appear to express other VSN-typical markers, such as Gnao1 and Gnai2 (Figure 2B), which are less commonly expressed by OSNs in the MOE. Therefore, it would be more precise to characterize this population as atypical VSNs that express ORs, rather than canonical OSNs.”

      We observe OR expression in VSNs in our data; these cells cluster with VSNs. The putative mOSN cluster exhibits its own trajectory, distinct from VSN clusters. These cells express Gnal (Golf), which is not expressed in VSNs expressing ORs, nor in any other cell-type in the data. We have performed differential gene expression analysis on the putative mOSN cluster to compare with V1R and V2R VSNs. GO analysis returned the top significantly enriched GO terms include “olfactory receptor activity” and “cilium”., further supporting that these are OSNs Because we were limited to list of 100 genes in Molecular Cartography probe panels, we have prioritized the detection of canonical VNO cell-types, vomeronasal receptor co-expression, and the putative sVSNs, and were not able to include a robust analysis of the putative OSNs. With regard to Gnai2 and Go expression, we have examined our data from the OSNs dissociated from the olfactory epithelium and detected substantial expression of both. This new analysis provides additional support for our claim. We will update the information in a revised manuscript.

      “(2) The second new class of sensory neurons identified corresponds to a group of VSNs expressing prototypical VSN markers (including V1Rs, V2Rs, and ORs), but exhibiting lower ribosomal gene expression. Clustering analysis reveals that this cell group is relatively isolated from V1R- and V2R-expressing clusters, particularly those comprising immature VSNs. The question then arises: where do these cells originate? Considering their fewer overall genes and lower total counts compared to mature VSNs, I wonder if these cells might represent regular VSNs in a later developmental stage, i.e., senescent VSNs. While the secretory cell hypothesis is compelling and supported by solid data, it could also align with a late developmental stage scenario. Further data supporting or excluding these hypotheses would aid in understanding the nature of this new cell cluster, with a comparison between juvenile and adult subjects appearing particularly relevant in this context.” 

      We wholeheartedly agree with this assessment. Our initial thought was that these were senescent VSNs, but the trajectory analysis did not support this scenario, leading us to propose that these are putative secretive cells. Our analysis also shows that overall, 46% of the putative sVSNs were from the P14 sample and 54% from P56. These cells comprise roughly 6.4% of all P14 cells and 8.5% of P56 cells. In comparison, 28.4% of all cells are mature V1R VSNs at P14, but the percentage rise to 46.7% at P56. The significant presence of sVSNs at P14, and the disproportionate increase when compared with mature VSNs indicate that these are unlikely to be late developmental stage or senescent cells, although we cannot exclude these possibilities. We plan to clarify these points in the revised manuscript.   

      We did not include sVSNs in the trajectory inference analysis because of inherent uncertainty about their developmental origins. However, PCA embeddings were the basis of the pseudotime analysis, and those embeddings that do include the sVSN cluster show that it is distributed evenly between the mature V1R and V2R clusters, with all mature clusters equidistant from GBC and INP clusters, indicating that they may indeed originate from the same stem cell populations. We plan to include trajectory analysis based on this assumption in the revised manuscript.

      (3) The authors' decision not to segregate the samples according to sex is understandable, especially considering previous bulk transcriptomic and functional studies supporting this approach. However, many of the highly expressed VR genes identified have been implicated in detecting sex-specific pheromones and triggering dimorphic behavior. It would be intriguing to investigate whether this lack of sex differences in VR expression persists at the single-cell level. Regardless of the outcome, understanding the presence or absence of major dimorphic changes would hold broad interest in the chemosensory field, offering insights into the regulation of dimorphic pheromone-induced behavior. Additionally, it could provide further support for proposed mechanisms of VR receptor choice in VSNs. 

      The reviewer raised a good point. We did not observe differences between male and female, or between P14 and P56 mice in the distribution of clusters and cells in UMAP space. Indeed, our differential expression analysis has revealed significantly differentially expressed genes in both comparisons. These genes have not been implicated in lineage or cell type determination and we decided not to include the analysis in the current version. In the revised manuscript, we plan to include the results.   

      “(4) The expression analysis of VRs and ORs seems to have been restricted to the cell clusters associated with the neuronal lineage. Are VRs/ORs expressed in other cell types, i.e. sustentacular, HBC, or other cells?” 

      Sparsely expressed low counts of VR and OR genes were observed in non-neuronal cell-types. When their expression as a percentage of cell-level gene counts is considered, however, the expression is negligible when compared to the neurons. The observed expression may be explained by stochastic base-level expression, or it may be the result of remnant ambient RNA that passed filtering. We will clarify this point in the revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Weaknesses to be addressed: 

      (1) More detail is required to understand the effects of genetic and drug manipulations on heart rate as these are important experiments. At the very least, a discussion on the limitations of these manipulations is needed. 

      - For example, how does one separate the pulsatile versus nutritive effects of blood flow/heartrate reduction? 

      - The conclusion that arterial SMC differentiation is driven by pulsatile blood flow needs to be toned down. Indeed, this conclusion is mainly supported by in vitro cell co-cultures exposed to laminar versus pulsatile flow. In vivo, reducing Tnnt2a expression affects cardiac contractility and blood flow does not selectively affect pulsatility. To make this conclusion, the authors would need an experimental means to selectively dampen the pulsatility of blood flow.

      We understand this concern and we toned down the statements related to the pulsatile flow of our conclusion by using 'flow' instead of 'pulsatile flow' in all text except for the in vitro co-cultures part. We also added a paragraph to discuss the limited capability of qualitatively reduce blood flow in vivo, and acknowledge that the effects of nutrients and flow reduction could not be uncoupled in live zebrafish embryos. We proposed that in the future, in vitro 3D vascular culture models may be combined with microfluidics to precisely calibrate nutrient composition in culture media, flow velocity and pulse; these methods would help address these questions more thoroughly. See page 11-12 line 312-322.

      (2) Since mural cells are sensitive to transmural pressure, could the authors elaborate on the potential role of raised intravascular pressure in SMC differentiation? This would better parallel rodents and humans. 

      We thank you for this suggestion. We added a paragraph to discuss the potential role of raised intravascular pressure in VSMC differentiation in the discussion section (see page 11 line 296-311).

      (3) The authors use nifedipine to reduce blood flow. Nifedipine is a specific and potent inhibitor of voltage-dependent calcium channels (VDCC) which are expressed in SMCs. Prior studies (PMID: 35588738) showed that VDCC blockers increased rather than inhibited SMC differentiation. Nifedipine is also likely to act upon VSMC calcium handling in the circle of Willis, which may in turn affect cell maturation. Could the authors comment on this seeming discrepancy?

      It is possible that off-target or indirect effects of Nifedipine decrease smooth muscle cell proliferation, or that altered cardiac contractility fundamentally alters aspects of vascular development other than blood flow. 

      - Additionally, it would be helpful to report the quantitative heart rate reduction achieved with Nifedipine. This would clear up concerns that the heart rate reduction is too large for normal vascular development to occur, and thus decrease proliferation rate independent of changes in blood flow pulsatility. 

      We concur with these comments, which is why our experimentation with Nifedipine is reinforced by employing an alternative, non-pharmacological strategy to inhibit blood flow: the use of morpholino against tnnt2a gene. The results with either Nifedipine or tnnt2a support the lack of VSMCs maturation. In addition, we provided the quantitative heart rate reduction achieved with Nifedipine shown in new Figure S2A-S2C, suggesting that the drug is not completely halting the heart rate but decreasing it. Nevertheless, we report that Zebrafish embryos can survive and develop a normal blood vascular system without any heartbeat. Hence, we exclude that the effect on VSMCs maturation is linked non-specifical effects caused by the loss of heartbeat. Nevertheless, we now acknowledged in our discussion the limitation of nifedipine, as it may affect VSMC through VDCCs (page 12, line 323-334).

      We also added a paragraph in the discussion section to compare nifedipine, an L-type VDCC blocker, and ML218, a T-type VDCC selective inhibitor from the previous study (Ando et al., 2022). We noted that in this previous study, the increase in VSMC differentiation only occur on anterior metencephalic central arteries (AMCtAs) that are more than 40 mm away from the BCA; these AMCtAs are much smaller than CoW arteries and have different geometry hence possible different kinetics of VSMC maturation (Ando et al., 2022) as our manuscript discovery would suggest.

      (4) The authors should provide more information on how blood flow velocity and wall shear stress are calculated from the Circle of Willis vascular structure. It is presumed that these values are dependent upon the 3-D morphology of the vessel network, as labeled by intravenous dextran dye, but this is not clear. (a second reviewer similarly comments: I was unclear how flow velocity values were obtained in Fig. 3E. Are they based on computational simulation, or are they experimentally calculated following the dextran injection?) Small local differences in vessel diameter and shape will influence blood flow velocity, but these morphological changes are not clearly articulated. Further, it is unclear how flow input levels to the CaDI and basilar arteries are decided across time points. For instance, is it possible to measure the blood flow speed empirically with line-scanning or high-speed tracking of labeled blood cells or particles? This would provide validation of the modeling results. 

      The computational fluid dynamic simulation was performed according to previous study from our lab (Barak et al., 2021). Blood flow velocity and wall shear stress are dependent upon the 3D morphology of the vessel network labeled by intravascular dextran. Details on how the computational fluid dynamic simulation was performed are added in method section page 17 line 433-449.

      Moreover, to address this reviewer concern we have now provided new experimental measurement of blood flow using the red blood cell (RBC) velocity via axial line scanning microscopy in Tg(kdrl:gfp;gata1:DsRed)zn1/sd2 zebrafish embryos at 54 hpf, 3 dpf, and 4 dpf. By using the experimental RBC velocity, we re-simulated the computational fluid dynamic. The new findings align with our conclusion and are further elaborated upon in response to this reviewer comment listed as point 6. Details on how RBC velocity calculated is added in method section page 16 line 414-431.

      (5) Does the cardiac injection of dextran itself affect the diameter of the arteries, given the invasiveness of the procedure? This could be examined in fish with a transgenic endothelial label with and without dextran. 

      Here, we performed an experiment on wildtype zebrafish at 5 days post-fertilization (dpf) with and without Dextran injection, examining the effects of Dextran injection on vessel diameters. As shown in the representative image below, the XZ panel clearly illustrates a Dextran-filled PCS vessel with no alteration in vessel size. Dextran microangiography, a technique employed to obtain vessel geometry with fluorescent microsphere, has been well established in zebrafish (Kamei et al., 2010). Our findings, demonstrating that Dextran does not affect vessel size, are consistent with previous studies utilizing Dextran microangiography.

      Author response image 1.

      (6) The data from the microangiography experiment in Figure 3 does not fully support the stated results. The authors report that the CaDI had the highest blood flow speed starting from 54 hpf, but it does not appear to be higher than the other arteries at this time point. Additionally, there is not sufficient evidence that wall shear stress coincides with smooth muscle cell differentiation in the CaDI. Wall shear stress appears to be similar between 54 hpf and 3 dpf in the CaDI, only increasing between 3 dpf and 4 dpf, while differentiation is shown to begin at 3 dpf. The authors need to address this and/or soften conclusions. 

      First, In response to this specific reviewer concern, we measured red blood cell (RBC) velocity by used axial line scanning microscopy to analyze Tg(kdrl:gfp;gata1:DsRed)zn1/sd2 zebrafish embryos (the detailed method was added in Method section in the manuscript). We replaced the computational simulated blood flow velocity by RBC velocity in new Figure 3E-3G, and re-run the computational simulated wall shear stress (WSS) using the RBC velocity in new Figure 3I-3K. We compared RBC velocity and WSS among different vessels at each time point. We confirmed that CaDI has the highest RBC velocity starting from 54 hpf to 4 dpf (new Figure 3A-3C, and 3E-3G) and found an overall increase in average WSS from 54 hpf to 4 dpf (new Figure 3A-3C, and 3H). Further, WSS in CaDI was significantly higher than BCA and PCS at 54 hpf, 3 dpf, and 4 dpf (new Figure 3A-3C, 3I-3K). Altogether, the CFD simulation suggests that CoW arteries experience different hemodynamic WSS that is associated with spatiotemporal pattern of VSMC differentiation on CoW arteries.”.  (Page 6, line 153-162)

      Second, to identify the correlation of WSS and VSMC differentiation in CaDI, we performed Pearson correlation analysis. In the image provided here, we plotted a linear regression with normalized # of acta2+ cells in CaDI and WSS with developmental stages (54 hpf, 3 and 4 dpf), and performed Pearson correlation coefficient analysis by using GraphPad Prism 10.0.3. The correlation coefficient r = 0.595, suggesting that the two variables (acta2+ cells and WSS) tend to increase together with developmental stages (54 hpf, 3 and 4 dpf).

      Author response image 2.

      Third, we softened our conclusion as the RBC velocity across CoW arteries was differentially distributed while VSMC differentiation occurred in these vessels.

      (7) It is unclear if acta2 expression is conferring vascular tone, as would be expected if the cells are behaving as mature VSMCs. Does arterial diameter decrease with an increase in acta2 expression? Are acta2-positive mural cells associated with more dynamic changes in arteriole diameter under basal or stimulated conditions? 

      Thanks for this interesting question. VSMC maturation and its vasoactivity could be further investigated in the future. Our study focused on early stage of VSMC differentiation, in which pdgfrb+ progenitors started to express VSMC marker acta2. We discussed the onset of transgelin expression and loss of abcc9 expression as markers of VSMC maturation. In addition, a previous study found that VSMC covered vessels in zebrafish brain dilate as early as 4 dpf and constrict at 6 dpf (Bahrami & Childs, 2020). Future study may focus on the association between expression of different VSMC markers and VSMC functional maturation. (page 10, line 272-279)

      (8) The authors argue that CoW vessels transition from venous to arterial identity (Fig. 1). However, kdrl is not an ideal arterial marker for this experiment as it is expressed in both arteries and veins. While it is true that many arterial beds have stronger kdrl expression than the veins, its expression in both arteries and veins changes with developmental stage, and its expression level may vary depending on the type of vessel. Therefore, showing that kdrl increases from 32 hpf - 4 dpf in CoW vessels is not convincing because its expression may increase in both venous or arterial vasculature as the vessels mature. In addition, flt4 expression is not exclusively venous; for example, it has noticeable expression in the dorsal aorta at 24-32 hpf stages. It would be helpful to confirm this transition by analyzing additional arterial and venous markers. 

      We acknowledge this and we added a paragraph to discuss the limitation. We combined loss of flt4 and increase in kdrl to establish the temporal sequence of circle of Willis morphogenesis, arterial specification, and VSMC differentiation. We acknowledge that additional arterial and venous markers need to be analyzed for a more thorough characterization of arterial specification in vertebrate brain vascular development. See page 12 line 335-341.

      (9) The authors show that acta2+ VSMCs are absent in tnnt2a MO embryos, concluding that blood flow is required for their differentiation from pericytes. However, there is no data showing that pericytes are still present in tnnt2a MO embryos. Although this has been previously shown by Ando et al 2016, it would be beneficial to confirm in the current study as this is a critical piece of evidence needed for this conclusion. 

      To determine if blood flow is dispensable for pdgfrb+ progenitor recruitment, we performed tnnt2a MO (0.35 ng/embryo) injection in Tg(pdgrb:egfp, kdrl:ras-mcherry) ncv22/s896. Loss of blood flow did not affect pdgfrb+ progenitor emergence around the CoW (new Figure S2G-S2H) at 3 days post fertilization (dpf). This is consistent with previous observation in Ando et al 2016 Figure S2C (Ando et al., 2016).

      (10) The authors show that klf2a MO injected embryos have a reduced number of VSMCs at 3 dpf but a normal number at 4 dpf (Fig. 6), concluding that klf2a is only important to initiate CaDI muscularization. If this is true, it would raise important questions about how VSMCs differentiate at a later stage in the absence of klf2a. For instance, is blood flow not required to differentiate at a later stage, or is there another factor that compensates in the absence of klf2a? The alternative explanation/ caveat is that klf2a MO loses efficacy with development, leading to the recovery of VSMCs at this stage. Therefore, it would be important to confirm this result using a genetic klf2a mutant. 

      Thank you for pointing this out.  We note that based on the klf2a reporter line, klf2a activity in CoW arterial endothelial cells is highly correlated with the number of acta2+ VSMCs in CaDI, BCA and PCS at 3 dpf (r = 0.974, new Figure S5J). Interestingly however, klf2a activity remained stable from 3 dpf to 4 dpf, well beyond initiation of VSMC differentiation. Thus, we speculate sustained klf2a expression may support further maturation of VSMCs, as acta2+ VSMCs showed distinct morphology at 4 dpf compared with 3 dpf. (Page 10, line 268-272). As for the observation that klf2a morphants have normal number of VSMCs at 4 dpf, we think that in addition to the temporary effect of morpholino, a proximal explanation is compensation by paralogous klf2b in zebrafish. We acknowledge that further characterization of CoW VSMC development in klf2a and klf2b double genetic mutants (Rasouli et al., 2018; Steed et al., 2016) may help determine whether klf2b compensates klf2a in CoW VSMC differentiation beyond 4 dpf. See page 10-11 line 292-295.

      (11) A large part of the discussion focuses on Notch and Wnt signaling, as downstream Klf2 effectors. While these are reasonable hypotheses to propose, there is no data on the involvement of these pathways in the current study. It seems excessive to speculate on detailed mechanisms of how Klf2 activates Notch and Wnt signaling in the absence of data showing that these pathways are affected in CoW vessels. Therefore, the discussion could be shortened here unless additional data can be obtained to demonstrate the involvement of these pathways in VSMCs in CoW.

      We concur and have condensed the discussion on Notch and Wnt signaling as downstream klf2 effectors.

      Minor comments: 

      (1) Line 138 "CaDI is the only vessels in the CoW receiving pulsatile arterial blood low ... ". Adding a reference to support this statement would be useful. 

      We agree and revised this sentence into ‘CaDI receive proximal arterial feed through lateral dorsal aorta from cardiac outflow tract (Isogai et al., 2001)’. It was also based on our general observation of zebrafish vascular anatomy and blood flow under a confocal microscope.

      (2) The image insets in Figs. 1A, 2A, 4E-L, 5A, 6A are quite small. Please make them larger to help the reader interpret the findings. 

      We agree. We maximized the image size to help the reader interpret the finding, and to visualize confocal images and schematics side-by-side.

      (3) The schematics in Figs. 1-2, and 4-6 are helpful, but the different cell types are difficult to see because they are small and their colors/shapes are not very distinct. 

      We agree. We increased the size and color contrast to provide better visualization of the schematics in new schematic Figures. 1-2 and 4-6.

      (4) It is stated that there are no diameter differences between different arteries, but statistics are not reported. 

      The statistics in Figure 3D were performed by ordinary two-way ANOVA followed by Tukey’s multiple comparisons test, with a single pooled variance. Here we added pairwise comparisons among vessels in the CoW. Hence when non indicated the difference are non-significant.

      (5) Figure 3F would be better visualized on a log scale, as it is difficult to see the differences between each post-fertilization timepoint. 

      We agree. In the new Figure 3H, the average wall shear stress (WSS) in CoW arteries is presented on log scale in y axis to see the differences between each post-fertilization timepoint.

      (6) Please provide more background and validation on the pericyte cell line, and their use for the questions in this study. 

      Thank you for the question, TgBAC(pdgfrb:egfp)ncv22 was generated and described by Ando et al 2016 to clarify mural cell coverage of vascular endothelium in zebrafish (Ando et al., 2016). We added a describe in the method section to provide background and validation on this pericyte line (see page 13 line 368-372).

      (7) Flow velocity and WSS changes are shown in each vessel in Figs. 3E,G. However, the comparison should be made between different types of vessels to see if there is a statistical difference and PCS, for example, which would explain differences in VSMC coverage. 

      We agreed. We compared the difference among arteries in the CoW at each developmental timepoint and performed ordinary one-way ANOVA with Tukey’s multiple comparisons test. Figure. 3E is replaced by new Figure. 3E-G and Figure. 3G is replaced by new Figure. 3I-K.

      (8) Similarly, between CaDI, the number of klf2a cells in Fig. 5B should be compared between different vessels, not between different stages of the same vessel. 

      We agree. In new Figure 5B-E, the number of klf2a+ cells per 100 μm vessel length are compared among different vessels at each developmental stage and analyzed by ordinary one-way ANOVA with Tukey’s multiple comparisons test.

      (9) When quantifying klf2+ cells in Fig. 5, it would be helpful to quantify klf2 expression level between cells in different vessels. This could be done by quantifying GFP expression in existing images. The difference in expression level may explain the variation between CaDI and PCS more accurately than just the difference in cell number. 

      The GFP expression reflect the stability of GFP protein expression and labels discrete nuclei with active klf2a expression. Hence the quantification of GFP level might not give an accurate readout of klf2a expression per se but rather of its activity. For this reason we don’t think that this experiment will add accurate measurement of klf2a expression.

      (10) Do data points in Figure 4D correspond to different cells in the same chamber experiment? If so, they cannot be treated as independent replicates. Each data point should correspond to an independent replicate experiment. 

      We agree. Now in the figure legend, we report the number of cells analyzed.

      (11) Graph placement is confusing in Figs. 4I, M. An adjacent Fig. 4G shows Nifedipine treated embryos, while the graph next to (Fig. 4I) shows acta+ cell number from tnnt2a 4 dpf experiment. Similarly, the bottom Fig. 4K tnn2a 4 dpf MO experiment has an adjacent graph Fig. 4M, which shows nifedipine treatment quantification, which makes it very confusing. 

      We agreed. We rearranged Figure 4E (representative images of control embryos at 3 dpf and 4 dpf), Figure 4F (tnnt2a MO embryos at 3 dpf and 4 dpf), Figure 4G (nifedipine treated embryos at 3 dpf and 4 dpf).

      Reference:

      Ando, K., Fukuhara, S., Izumi, N., Nakajima, H., Fukui, H., Kelsh, R. N., & Mochizuki, N. (2016). Clarification of mural cell coverage of vascular endothelial cells by live imaging of zebrafish. Development, 143(8), 1328-1339. https://doi.org/10.1242/dev.132654

      Ando, K., Tong, L., Peng, D., Vazquez-Liebanas, E., Chiyoda, H., He, L., Liu, J., Kawakami, K., Mochizuki, N., Fukuhara, S., Grutzendler, J., & Betsholtz, C. (2022). KCNJ8/ABCC9-containing K-ATP channel modulates brain vascular smooth muscle development and neurovascular coupling. Dev Cell, 57(11), 1383-1399 e1387. https://doi.org/10.1016/j.devcel.2022.04.019

      Bahrami, N., & Childs, S. J. (2020). Development of vascular regulation in the zebrafish embryo. Development, 147(10). https://doi.org/10.1242/dev.183061

      Barak, T., Ristori, E., Ercan-Sencicek, A. G., Miyagishima, D. F., Nelson-Williams, C., Dong, W., Jin, S. C., Prendergast, A., Armero, W., Henegariu, O., Erson-Omay, E. Z., Harmanci, A. S., Guy, M., Gultekin, B., Kilic, D., Rai, D. K., Goc, N., Aguilera, S. M., Gulez, B., . . . Gunel, M. (2021). PPIL4 is essential for brain angiogenesis and implicated in intracranial aneurysms in humans. Nat Med, 27(12), 2165-2175. https://doi.org/10.1038/s41591-021-01572-7

      Isogai, S., Horiguchi, M., & Weinstein, B. M. (2001). The vascular anatomy of the developing zebrafish: an atlas of embryonic and early larval development. Dev Biol, 230(2), 278-301. https://doi.org/10.1006/dbio.2000.9995

      Kamei, M., Isogai, S., Pan, W., & Weinstein, B. M. (2010). Imaging blood vessels in the zebrafish. In Methods in cell biology (Vol. 100, pp. 27-54). Elsevier.

      Rasouli, S. J., El-Brolosy, M., Tsedeke, A. T., Bensimon-Brito, A., Ghanbari, P., Maischein, H. M., Kuenne, C., & Stainier, D. Y. (2018). The flow responsive transcription factor Klf2 is required for myocardial wall integrity by modulating Fgf signaling. Elife, 7. https://doi.org/10.7554/eLife.38889

      Steed, E., Faggianelli, N., Roth, S., Ramspacher, C., Concordet, J. P., & Vermot, J. (2016). klf2a couples mechanotransduction and zebrafish valve morphogenesis through fibronectin synthesis. Nat Commun, 7, 11646. https://doi.org/10.1038/ncomms11646

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful for the handling of our manuscript. The following is a summary of our response and what we have done:

      (1) We are most thankful for the very thorough evaluation of our manuscript.

      (2) We were a bit shocked by the very negative commentary of referee 2.

      (3) We think, what put referee 2 off so much is that we were overconfident in the strength of our conclusions. We consider such overconfidence a big mistake. We have revised the manuscript to fix this problem.

      (4) We respond in great depth to all criticism and also go into technicalities.

      (5) We consider the possibility of a mistake. Yet, we carefully weighed the evidence advanced by referee 2 and by us and found that a systematic review supports our conclusions. Hence, we also resist the various attempts to crush our paper.

      (6) We added evidence (peripherin-antibody staining; our novel Figure 2) that suggests we correctly identified the inferior olive.

      (7) The eLife format – in which critical commentary is published along with the paper – is a fantastic venue to publish, what appears to be a surprisingly controversial issue.

      eLife assessment

      This potentially valuable study uses classic neuroanatomical techniques and synchrotron X-ray tomography to investigate the mapping of the trunk within the brainstem nuclei of the elephant brain. Given its unique specializations, understanding the somatosensory projections from the elephant trunk would be of general interest to evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. However, the anatomical analysis is inadequate to support the authors' conclusion that they have identified the elephant trigeminal sensory nuclei rather than a different brain region, specifically the inferior olive.

      Comment: We are happy that our paper is considered to be potentially valuable. Also, the editors highlight the potential interest of our work for evolutionary neurobiologists, comparative neuroscientists, and animal behavior scientists. The editors are more negative when it comes to our evidence on the identification of the trigeminal nucleus vs the inferior olive. We have five comments on this assessment. (i) We think this assessment is heavily biased by the comments of referee 2. We show that the referee’s comments are more about us than about our paper. Hence, the referee failed to do their job (refereeing our paper) and should not have succeeded in leveling our paper. (ii) We have no ad hoc knock-out experiments to distinguish the trigeminal nucleus vs the inferior olive. Such experiments (extracellular recording & electrolytic lesions, viral tracing would be done in a week in mice, but they cannot and should not be done in elephants. (iii) We have extraordinary evidence. Nobody has ever described a similarly astonishing match of body (trunk folds) and myeloarchitecture in the brain before. (iv) We show that our assignment of the trigeminal nucleus vs the inferior olive is more plausible than the current hypothesis about the assignment of the trigeminal nucleus vs the inferior olive as defended by referee 2. We think this is why it is important to publish our paper. (v) We think eLife is the perfect place for our publication because the deviating views of referee 2 are published along.

      Change: We performed additional peripherin-antibody staining to differentiate the inferior olive and trigeminal nucleus. Peripherin is a cytoskeletal protein that is found in peripheral nerves and climbing fibers. Specifically, climbing fibers of various species (mouse, rabbit, pig, cow, and human; Errante et al., 1998) are stained intensely with peripherin-antibodies. What is tricky for our purposes is that there is also some peripherin-antibody reactivity in the trigeminal nuclei (Errante et al., 1998). Such peripherin-antibody reactivity is weaker, however, and lacks the distinct axonal bundle signature that stems from the strong climbing fiber peripherin-reactivity as seen in the inferior olive (Errante et al., 1998). As can be seen in our novel Figure 2, we observe peripherin-reactivity in axonal bundles (i.e. in putative climbing fibers), in what we think is the inferior olive. We also observe weak peripherin-reactivity, in what we think is the trigeminal nucleus, but not the distinct and strong labeling of axonal bundles. These observations are in line with our ideas but are difficult to reconcile with the views of the referee. Specifically, the lack of peripherin-reactive axon bundles suggests that there are no climbing fibers in what the referee thinks is the inferior olive.

      Errante, L., Tang, D., Gardon, M., Sekerkova, G., Mugnaini, E., & Shaw, G. (1998). The intermediate filament protein peripherin is a marker for cerebellar climbing fibres. Journal of neurocytology, 27, 69-84.

      Reviewer #1 :

      Summary:

      This fundamental study provides compelling neuroanatomical evidence underscoring the sensory function of the trunk in African and Asian elephants. Whereas myelinated tracts are classically appreciated as mediating neuronal connections, the authors speculate that myelinated bundles provide functional separation of trunk folds and display elaboration related to the "finger" projections. The authors avail themselves of many classical neuroanatomical techniques (including cytochrome oxidase stains, Golgi stains, and myelin stains) along with modern synchrotron X-ray tomography. This work will be of interest to evolutionary neurobiologists, comparative neuroscientists, and the general public, with its fascinating exploration of the brainstem of an icon sensory specialist. 

      Comment: We are incredibly grateful for this positive assessment.

      Changes: None.

      Strengths: 

      - The authors made excellent use of the precious sample materials from 9 captive elephants. 

      - The authors adopt a battery of neuroanatomical techniques to comprehensively characterize the structure of the trigeminal subnuclei and properly re-examine the "inferior olive".

      - Based on their exceptional histological preparation, the authors reveal broadly segregated patterns of metabolic activity, similar to the classical "barrel" organization related to rodent whiskers. 

      Comment: The referee provides a concise summary of our findings.

      Changes: None.

      Weaknesses: 

      - As the authors acknowledge, somewhat limited functional description can be provided using histological analysis (compared to more invasive techniques). 

      - The correlation between myelinated stripes and trunk fold patterns is intriguing, and Figure 4 presents this idea beautifully. I wonder - is the number of stripes consistent with the number of trunk folds? Does this hold for both species? 

      Comment: We agree with the referee’s assessment. We note that cytochrome-oxidase staining is an at least partially functional stain, as it reveals constitutive metabolic activity. A significant problem of the work in elephants is that our recording possibilities are limited, which in turn limits functional analysis. As indicated in Figure 5 (our former Figure 4) for the African elephant Indra, there was an excellent match of trunk folds and myelin stripes. Asian elephants have more, and less conspicuous trunk folds than African elephants. As illustrated in Figure 7, Asian elephants have more, and less conspicuous myelin stripes. Thus, species differences in myelin stripes correlate with species differences in trunk folds.

      Changes: We clarify the relation of myelin stripe and trunk fold patterns in our description of Figure 7.

      Reviewer #2 (Public Review): 

      The authors describe what they assert to be a very unusual trigeminal nuclear complex in the brainstem of elephants, and based on this, follow with many speculations about how the trigeminal nuclear complex, as identified by them, might be organized in terms of the sensory capacity of the elephant trunk.

      Comment: We agree with the referee’s assessment that the putative trigeminal nucleus described in our paper is highly unusual in size, position, vascularization, and myeloarchitecture. This is why we wrote this paper. We think these unusual features reflect the unique facial specializations of elephants, i.e. their highly derived trunk. Because we have no access to recordings from the elephant brainstem, we cannot back up all our functional interpretations with electrophysiological evidence; it is therefore fair to call them speculative.

      Changes: None.

      The identification of the trigeminal nuclear complex/inferior olivary nuclear complex in the elephant brainstem is the central pillar of this manuscript from which everything else follows, and if this is incorrect, then the entire manuscript fails, and all the associated speculations become completely unsupported. 

      Comment: We agree.

      Changes: None.

      The authors note that what they identify as the trigeminal nuclear complex has been identified as the inferior olivary nuclear complex by other authors, citing Shoshani et al. (2006; 10.1016/j.brainresbull.2006.03.016) and Maseko et al (2013; 10.1159/000352004), but fail to cite either Verhaart and Kramer (1958; PMID 13841799) or Verhaart (1962; 10.1515/9783112519882-001). These four studies are in agreement, but the current study differs.

      Comment & Change: We were not aware of the papers of Verhaart and included them in the revised manusript.

      Let's assume for the moment that the four previous studies are all incorrect and the current study is correct. This would mean that the entire architecture and organization of the elephant brainstem is significantly rearranged in comparison to ALL other mammals, including humans, previously studied (e.g. Kappers et al. 1965, The Comparative Anatomy of the Nervous System of Vertebrates, Including Man, Volume 1 pp. 668-695) and the closely related manatee (10.1002/ar.20573). This rearrangement necessitates that the trigeminal nuclei would have had to "migrate" and shorten rostrocaudally, specifically and only, from the lateral aspect of the brainstem where these nuclei extend from the pons through to the cervical spinal cord (e.g. the Paxinos and Watson rat brain atlases), the to the spatially restricted ventromedial region of specifically and only the rostral medulla oblongata. According to the current paper, the inferior olivary complex of the elephant is very small and located lateral to their trigeminal nuclear complex, and the region from where the trigeminal nuclei are located by others appears to be just "lateral nuclei" with no suggestion of what might be there instead.

      Comment: We have three comments here:

      (1) The referee correctly notes that we argue the elephant brainstem underwent fairly major rearrangements. In particular, we argue that the elephant inferior olive was displaced laterally, by a very large cell mass, which we argue is an unusually large trigeminal nucleus. To our knowledge, such a large compact cell mass is not seen in the ventral brain stem of any other mammal.

      (2) The referee makes it sound as if it is our private idea that the elephant brainstem underwent major rearrangements and that the rest of the evidence points to a conventional ‘rodent-like’ architecture. This is far from the truth, however. Already from the outside appearance (see our Figure 1B and Figure 7A) it is clear that the elephant brainstem has huge ventral bumps not seen in any other mammal. An extraordinary architecture also holds at the organizational level of nuclei. Specifically, the facial nucleus – the most carefully investigated nucleus in the elephant brainstem – has an appearance distinct from that of the facial nuclei of all other mammals (Maseko et al., 2013; Kaufmann et al., 2022). If both the overall shape and the constituting nuclei of the brainstem are very different from other mammals, it is very unlikely if not impossible that the elephant brainstem follows in all regards a conventional ‘rodent-like’ architecture.

      (3) The inferior olive is an impressive nucleus in the partitioning scheme we propose (Figure 2). In fact – together with the putative trigeminal nucleus we describe – it’s the most distinctive nucleus in the elephant brainstem. We have not done volumetric measurements and cell counts here, but think this is an important direction for future work. What has informed our work is that the inferior olive nucleus we describe has the serrated organization seen in the inferior olive of all mammals. We will discuss these matters in depth below.

      Changes: None.

      Such an extraordinary rearrangement of brainstem nuclei would require a major transformation in the manner in which the mutations, patterning, and expression of genes and associated molecules during development occur. Such a major change is likely to lead to lethal phenotypes, making such a transformation extremely unlikely. Variations in mammalian brainstem anatomy are most commonly associated with quantitative changes rather than qualitative changes (10.1016/B978-0-12-804042-3.00045-2). 

      Comment: We have two comments here:

      (1) The referee claims that it is impossible that the elephant brainstem differs from a conventional brainstem architecture because this would lead to lethal phenotypes etc. Following our previous response, this argument does not hold. It is out of the question that the elephant brainstem looks very different from the brainstem of other mammals. Yet, it is also evident that elephants live. The debate we need to have is not if the elephant brainstem differs from other mammals, but how it differs from other mammals.

      (2) In principle we agree with the referee’s thinking that the model of the elephant brainstem that is most likely to be correct is the one that requires the least amount of rearrangements to other mammals. We therefore prepared a comparison of the model the referee is proposing (Maseko et al., 2013; see Referee Table 1 below) with our proposition. We scored these models on their similarity to other mammals. We find that the referee’s ideas (Maseko et al., 2013) require more rearrangements relative to other mammals than our suggestion.

      Changes: Inclusion of Referee Table 1, which we discuss in depth below.

      The impetus for the identification of the unusual brainstem trigeminal nuclei in the current study rests upon a previous study from the same laboratory (10.1016/j.cub.2021.12.051) that estimated that the number of axons contained in the infraorbital branch of the trigeminal nerve that innervate the sensory surfaces of the trunk is approximately 400 000. Is this number unusual? In a much smaller mammal with a highly specialized trigeminal system, the platypus, the number of axons innervating the sensory surface of the platypus bill skin comes to 1 344 000 (10.1159. Yet, there is no complex rearrangement of the brainstem trigeminal nuclei in the brain of the developing or adult platypus (Ashwell, 2013, Neurobiology of Monotremes), despite the brainstem trigeminal nuclei being very large in the platypus (10.1159/000067195). Even in other large-brained mammals, such as large whales that do not have a trunk, the number of axons in the trigeminal nerve ranges between 400,000 and 500,000 (10.1007. The lack of comparative support for the argument forwarded in the previous and current study from this laboratory, and that the comparative data indicates that the brainstem nuclei do not change in the manner suggested in the elephant, argues against the identification of the trigeminal nuclei as outlined in the current study. Moreover, the comparative studies undermine the prior claim of the authors, informing the current study, that "the elephant trigeminal ganglion ... point to a high degree of tactile specialization in elephants" (10.1016/j.cub.2021.12.051). While clearly, the elephant has tactile sensitivity in the trunk, it is questionable as to whether what has been observed in elephants is indeed "truly extraordinary".

      Comment: These comments made us think that the referee is not talking about the paper we submitted, but that the referee is talking about us and our work in general. Specifically, the referee refers to the platypus and other animals dismissing our earlier work, which argued for a high degree of tactile specialization in elephants. We think the referee’s intuitions are wrong and our earlier work is valid.

      Changes: We prepared a Author response image 1 (below) that puts the platypus brain, a monkey brain, and the elephant trigeminal ganglion (which contains a large part of the trunk innervating cells) in perspective.

      Author response image 1.

      The elephant trigeminal ganglion is comparatively large. Platypus brain, monkey brain, and elephant ganglion. The elephant has two trigeminal ganglia, which contain the first-order somatosensory neurons. They serve mainly for tactile processing and are large compared to a platypus brain (from the comparative brain collection) and are similar in size to a monkey brain. The idea that elephants might be highly specialized for trunk touch is also supported by the analysis of the sensory nerves of these animals (Purkart et al., 2022). Specifically, we find that the infraorbital nerve (which innervates the trunk) is much thicker than the optic nerve (which mediates vision) and the vestibulocochlear nerve (which mediates hearing). Thus, not everything is large about elephants; instead, the data argue that these animals are heavily specialized for trunk touch.

      But let's look more specifically at the justification outlined in the current study to support their identification of the unusually located trigeminal sensory nuclei of the brainstem. 

      (1) Intense cytochrome oxidase reactivity.

      (2) Large size of the putative trunk module.

      (3) Elongation of the putative trunk module.

      (4) The arrangement of these putative modules corresponds to elephant head

      anatomy. 

      (5) Myelin stripes within the putative trunk module that apparently match trunk folds. <br /> (6) Location apparently matches other mammals.

      (7) Repetitive modular organization apparently similar to other mammals. <br /> (8) The inferior olive described by other authors lacks the lamellated appearance of this structure in other mammals.

      Comment: We agree those are key issues.

      Changes: None.

      Let's examine these justifications more closely.

      (1) Cytochrome oxidase histochemistry is typically used as an indicative marker of neuronal energy metabolism. The authors indicate, based on the "truly extraordinary" somatosensory capacities of the elephant trunk, that any nuclei processing this tactile information should be highly metabolically active, and thus should react intensely when stained for cytochrome oxidase. We are told in the methods section that the protocols used are described by Purkart et al (2022) and Kaufmann et al (2022). In neither of these cited papers is there any description, nor mention, of the cytochrome oxidase histochemistry methodology, thus we have no idea of how this histochemical staining was done. To obtain the best results for cytochrome oxidase histochemistry, the tissue is either processed very rapidly after buffer perfusion to remove blood or in recently perfusion-fixed tissue (e.g., 10.1016/0165-0270(93)90122-8). Given: (1) the presumably long post-mortem interval between death and fixation - "it often takes days to dissect elephants"; (2) subsequent fixation of the brains in 4% paraformaldehyde for "several weeks"; (3) The intense cytochrome oxidase reactivity in the inferior olivary complex of the laboratory rat (Gonzalez-Lima, 1998, Cytochrome oxidase in neuronal metabolism and Alzheimer's diseases); and (4) The lack of any comparative images from other stained portions of the elephant brainstem; it is difficult to support the justification as forwarded by the authors. The histochemical staining observed is likely background reactivity from the use of diaminobenzidine in the staining protocol. Thus, this first justification is unsupported. 

      Comment: The referee correctly notes the description of our cytochrome-oxidase reactivity staining was lacking. This is a serious mistake of ours for which we apologize very much. The referee then makes it sound as if we messed up our cytochrome-oxidase staining, which is not the case. All successful (n = 3; please see our technical comments in the recommendation section) cytochrome-oxidase stainings were done with elephants with short post-mortem times (≤ 2 days) to brain removal/cooling and only brief immersion fixation (≤ 1 day). Cytochrome-oxidase reactivity in elephant brains appears to be more sensitive to quenching by fixation than is the case for rodent brains. We think it is a good idea to include a cytochrome-oxidase staining overview picture because we understood from the referee’s comments that we need to compare our partitioning scheme of the brainstem with that of other authors. To this end, we add a cytochrome-oxidase staining overview picture (Author response image 3) along with an alternative interpretation from Maseko et al., 2013.

      Changes: (1) We added details on our cytochrome-oxidase reactivity staining protocol and the cytochrome-oxidase reactivity in the elephant brain in the manuscript and in our response to the general recommendations.

      (2) We provide a detailed discussion of the technicalities of cytochrome-oxidase staining below in the recommendation section, where the referee raised further criticisms.

      (3) We include a cytochrome-oxidase staining overview picture (Author response image 2) along with an alternative interpretation from Maseko et al., 2013.

      Author response image 2.

      Cytochrome-oxidase staining overview. Coronal cytochrome-oxidase staining overview from African elephant cow Indra; the section is taken a few millimeters posterior to the facial nucleus. Brown is putatively neural cytochrome-reactivity, and white is the background. Black is myelin diffraction and (seen at higher resolution, when you zoom in) erythrocyte cytochrome-reactivity in blood vessels (see our Figure 1E-G); such blood vessel cytochrome-reactivity is seen, because we could not perfuse the animal. There appears to be a minimal outside-in-fixation artifact (i.e. a more whitish/non-brownish appearance of the section toward the borders of the brain). This artifact is not seen in sections from Indra that we processed earlier or in other elephant brains processed at shorter post-mortem/fixation delays (see our Figure 1C).

      The same structures can be recognized in Author response image 2 and Supplememntary figure 36 of Maseko et al. (2013). The section is taken at an anterior-posterior level, where we encounter the trigeminal nuclei in pretty much all mammals. Note that the neural cytochrome reactivity is very high, in what we refer to as the trigeminal-nuclei-trunk-module and what Maseko et al. refer to as inferior olive. Myelin stripes can be recognized here as white omissions.

      At the same time, the cytochrome-oxidase-reactivity is very low in what Maseko et al. refer to as trigeminal nuclei. The indistinct appearance and low cytochrome-oxidase-reactivity of the trigeminal nuclei in the scheme of Maseko et al. (2013) is unexpected because trigeminal nuclei stain intensely for cytochrome-oxidase-reactivity in most mammals and because the trigeminal nuclei represent the elephant’s most important body part, the trunk. Staining patterns of the trigeminal nuclei as identified by Maseko et al. (2013) are very different at more posterior levels; we will discuss this matter below.

      Justifications (2), (3), and (4) are sequelae from justification (1). In this sense, they do not count as justifications, but rather unsupported extensions. 

      Comment: These are key points of our paper that the referee does not discuss.

      Changes: None.

      (4) and (5) These are interesting justifications, as the paper has clear internal contradictions, and (5) is a sequelae of (4). The reader is led to the concept that the myelin tracts divide the nuclei into sub-modules that match the folding of the skin on the elephant trunk. One would then readily presume that these myelin tracts are in the incoming sensory axons from the trigeminal nerve. However, the authors note that this is not the case: "Our observations on trunk module myelin stripes are at odds with this view of myelin. Specifically, myelin stripes show no tapering (which we would expect if axons divert off into the tissue). More than that, there is no correlation between myelin stripe thickness (which presumably correlates with axon numbers) and trigeminal module neuron numbers. Thus, there are numerous myelinated axons, where we observe few or no trigeminal neurons. These observations are incompatible with the idea that myelin stripes form an axonal 'supply' system or that their prime function is to connect neurons. What do myelin stripe axons do, if they do not connect neurons? We suggest that myelin stripes serve to separate rather than connect neurons." So, we are left with the observation that the myelin stripes do not pass afferent trigeminal sensory information from the "truly extraordinary" trunk skin somatic sensory system, and rather function as units that separate neurons - but to what end? It appears that the myelin stripes are more likely to be efferent axonal bundles leaving the nuclei (to form the olivocerebellar tract). This justification is unsupported.

      Comment: The referee cites some of our observations on myelin stripes, which we find unusual. We stand by the observations and comments. The referee does not discuss the most crucial finding we report on myelin stripes, namely that they correspond remarkably well to trunk folds.

      Changes: None.

      (6) The authors indicate that the location of these nuclei matches that of the trigeminal nuclei in other mammals. This is not supported in any way. In ALL other mammals in which the trigeminal nuclei of the brainstem have been reported they are found in the lateral aspect of the brainstem, bordered laterally by the spinal trigeminal tract. This is most readily seen and accessible in the Paxinos and Watson rat brain atlases. The authors indicate that the trigeminal nuclei are medial to the facial nerve nucleus, but in every other species, the trigeminal sensory nuclei are found lateral to the facial nerve nucleus. This is most salient when examining a close relative, the manatee (10.1002/ar.20573), where the location of the inferior olive and the trigeminal nuclei matches that described by Maseko et al (2013) for the African elephant. This justification is not supported. 

      Comment: The referee notes that we incorrectly state that the position of the trigeminal nuclei matches that of other mammals. We think this criticism is justified.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see below Referee Table 1). Here we acknowledge the referee’s argument and we also changed the manuscript accordingly.

      (7) The dual to quadruple repetition of rostrocaudal modules within the putative trigeminal nucleus as identified by the authors relies on the fact that in the neurotypical mammal, there are several trigeminal sensory nuclei arranged in a column running from the pons to the cervical spinal cord, these include (nomenclature from Paxinos and Watson in roughly rostral to caudal order) the Pr5VL, Pr5DM, Sp5O, Sp5I, and Sp5C. However, these nuclei are all located far from the midline and lateral to the facial nerve nucleus, unlike what the authors describe in the elephants. These rostrocaudal modules are expanded upon in Figure 2, and it is apparent from what is shown that the authors are attributing other brainstem nuclei to the putative trigeminal nuclei to confirm their conclusion. For example, what they identify as the inferior olive in Figure 2D is likely the lateral reticular nucleus as identified by Maseko et al (2013). This justification is not supported.

      Comment: The referee again compares our findings to the scheme of Maseko et al. (2013) and rejects our conclusions on those grounds. We think such a comparison of our scheme is needed, indeed.

      Changes: We prepared a comparison of the Maseko et al. (2013) scheme of the elephant brainstem with our scheme of the elephant brainstem (see below Referee Table 1).

      (8) In primates and related species, there is a distinct banded appearance of the inferior olive, but what has been termed the inferior olive in the elephant by other authors does not have this appearance, rather, and specifically, the largest nuclear mass in the region (termed the principal nucleus of the inferior olive by Maseko et al, 2013, but Pr5, the principal trigeminal nucleus in the current paper) overshadows the partial banded appearance of the remaining nuclei in the region (but also drawn by the authors of the current paper). Thus, what is at debate here is whether the principal nucleus of the inferior olive can take on a nuclear shape rather than evince a banded appearance. The authors of this paper use this variance as justification that this cluster of nuclei could not possibly be the inferior olive. Such a "semi-nuclear/banded" arrangement of the inferior olive is seen in, for example, giraffe (10.1016/j.jchemneu.2007.05.003), domestic dog, polar bear, and most specifically the manatee (a close relative of the elephant) (brainmuseum.org; 10.1002/ar.20573). This justification is not supported. 

      Comment: We carefully looked at the brain sections referred to by the referee in the brainmuseum.org collection. We found contrary to the referee’s claims that dogs, polar bears, and manatees have a perfectly serrated (a cellular arrangement in curved bands) appearance of the inferior olive. Accordingly, we think the referee is not reporting the comparative evidence fairly and we wonder why this is the case.

      Changes: None.

      Thus, all the justifications forwarded by the authors are unsupported. Based on methodological concerns, prior comparative mammalian neuroanatomy, and prior studies in the elephant and closely related species, the authors fail to support their notion that what was previously termed the inferior olive in the elephant is actually the trigeminal sensory nuclei. Given this failure, the justifications provided above that are sequelae also fail. In this sense, the entire manuscript and all the sequelae are not supported.

      Comment: We disagree. To summarize:

      (1) Our description of the cytochrome oxidase staining lacked methodological detail, which we have now added; the cytochrome oxidase reactivity data are great and support our conclusions.

      (2)–(5)The referee does not really discuss our evidence on these points.

      (6) We were wrong and have now fixed this mistake.

      (7) The referee asks for a comparison to the Maseko et al. (2013) scheme (agreed, see Referee Table 1).

      (8) The referee bends the comparative evidence against us.

      Changes: None.

      A comparison of the elephant brainstem partitioning schemes put forward by Maseko et al 2013 and by Reveyaz et al.

      To start with, we would like to express our admiration for the work of Maseko et al. (2013). These authors did pioneering work on obtaining high-quality histology samples from elephants. Moreover, they made a heroic neuroanatomical effort, in which they assigned 147 brain structures to putative anatomical entities. Most of their data appear to refer to staining in a single elephant and one coronal sectioning plane. The data quality and the illustration of results are excellent.

      We studied mainly two large nuclei in six (now 7) elephants in three (coronal, parasagittal, and horizontal) sectioning planes. The two nuclei in question are the two most distinct nuclei in the elephant brainstem, namely an anterior ventromedial nucleus (the trigeminal trunk module in our terminology; the inferior olive in the terminology of Maseko et al., 2013) and a more posterior lateral nucleus (the inferior olive in our terminology; the posterior part of the trigeminal nuclei in the terminology of Maseko et al., 2013).

      Author response image 3 gives an overview of the two partitioning schemes for inferior olive/trigeminal nuclei along with the rodent organization (see below).

      Author response image 3.

      Overview of the brainstem organization in rodents & elephants

      The strength of the Maseko et al. (2013) scheme is the excellent match of the position of elephant nuclei to the position of nuclei in the rodent (Author response image 3). We think this positional match reflects the fact that Maseko et al. (2013) mapped a rodent partitioning scheme on the elephant brainstem. To us, this is a perfectly reasonable mapping approach. As the referee correctly points out, the positional similarity of both elephant inferior olive and trigeminal nuclei to the rodent strongly argues in favor of the Maseko et al. (2013), because brainstem nuclei are positionally very conservative.

      Other features of the Maseko et al. (2013) scheme are less favorable. The scheme marries two cyto-architectonically very distinct divisions (an anterior indistinct part) and a super-distinct serrated posterior part to be the trigeminal nuclei. We think merging entirely distinct subdivisions into one nucleus is a byproduct of mapping a rodent partitioning scheme on the elephant brainstem. Neither of the two subdivisions resemble the trigeminal nuclei of other mammals. The cytochrome oxidase staining patterns differ markedly across the anterior indistinct part (see our Author response image 3) and the posterior part of the trigeminal nuclei and do not match with the intense cytochrome oxidase reactivity of other mammalian trigeminal nuclei (Author response image 2). Our anti-peripherin staining (the novel Figure 2 of our manuscript) indicates that there probably no climbing fibers, in what Maseko et al. think. is inferior olive; this is a potentially fatal problem for the hypothesis. The posterior part of Maseko et al. (2013) trigeminal nuclei has a distinct serrated appearance that is characteristic of the inferior olive in other mammals. Moreover, the inferior olive of Maseko et al. (2013) lacks the serrated appearance of the inferior olive seen in pretty much all mammals; this is a serious problem.

      The partitioning scheme of Reveyaz et al. comes with poor positional similarity but avoids the other problems of the Maseko et al. (2013) scheme. Our explanation for the positionally deviating location of trigeminal nuclei is that the elephant grew one of the if not the largest trigeminal systems of all mammals. As a result, the trigeminal nuclei grew through the floor of the brainstem. We understand this is a post hoc just-so explanation, but at least it is an explanation.

      The scheme of Reveyaz et al. was derived in an entirely different way from the Maseko model. Specifically, we were convinced that the elephant trigeminal nuclei ought to be very special because of the gigantic trigeminal ganglia (Purkart et al., 2022). Cytochrome-oxidase staining revealed a large distinct nucleus with an elongated shape. Initially, we were freaked out by the position of the nucleus and the fact that it was referred to as inferior olive by other authors. When we found an inferior-olive-like nucleus at a nearby (although at an admittedly unusual) location, we were less worried. We then optimized the visualization of myelin stripes (brightfield imaging etc.) and were able to collect an entire elephant trunk along with the brain (African elephant cow Indra). When we made the one-to-one match of Indra’s trunk folds and myelin stripes (former Figure 4, now Figure 5) we were certain that we had identified the trunk module of the trigeminal nuclei. We already noted at the outset of our rebuttal that we now consider such certainty a fallacy of overconfidence. In light of the comments of Referee 2, we feel that a further discussion of our ideas is warranted.

      A strength of the Reveyaz model is that nuclei look like single anatomical entities. The trigeminal nuclei look like trigeminal nuclei of other mammals, the trunk module has a striking resemblance to the trunk and the inferior olive looks like the inferior olive of other mammals.

      We evaluated the fit of the two models in the form of a table (Author response table 1; below). Unsurprisingly, Author response table 1 aligns with our views of elephant brainstem partitioning.

      Author response table 1

      Qualitative evaluation of elephant brainstem partitioning schemes

      ++ = Very attractive; + = attractive; - = unattractive; -- = very unattractive

      We scored features that are clear and shared by all mammals – as far as we know them – as very attractive.

      We scored features that are clear and are not shared by all mammals – as far as we know them – as very unattractive.

      Attractive features are either less clear or less well-shared features.

      Unattractive features are either less clear or less clearly not shared features.

      Author response table 1 suggests two conclusions to us. (i) The Reveyaz et al. model has mainly favorable properties. The Maseko et al. (2013) model has mainly unfavorable properties. Hence, the Reveyaz et al. model is more likely to be true. (ii) The outcome is not black and white, i.e., both models have favorable and unfavorable properties. Accordingly, we overstated our case in our initial submission and toned down our claims in the revised manuscript.

      What the authors have not done is to trace the pathway of the large trigeminal nerve in the elephant brainstem, as was done by Maseko et al (2013), which clearly shows the internal pathways of this nerve, from the branch that leads to the fifth mesencephalic nucleus adjacent to the periventricular grey matter, through to the spinal trigeminal tract that extends from the pons to the spinal cord in a manner very similar to all other mammals. Nor have they shown how the supposed trigeminal information reaches the putative trigeminal nuclei in the ventromedial rostral medulla oblongata. These are but two examples of many specific lines of evidence that would be required to support their conclusions. Clearly, tract tracing methods, such as cholera toxin tracing of peripheral nerves cannot be done in elephants, thus the neuroanatomy must be done properly and with attention to detail to support the major changes indicated by the authors. 

      Comment: The referee claims that Maseko et al. (2013) showed by ‘tract tracing’ that the structures they refer to trigeminal nuclei receive trigeminal input. This statement is at least slightly misleading. There is nothing of what amounts to proper ‘tract tracing’ in the Maseko et al. (2013) paper, i.e. tracing of tracts with post-mortem tracers. We tried proper post-mortem tracing but failed (no tracer transport) probably as a result of the limitations of our elephant material. What Maseko et al. (2013) actually did is look a bit for putative trigeminal fibers and where they might go. We also used this approach. In our hands, such ‘pseudo tract tracing’ works best in unstained material under bright field illumination, because myelin is very well visualized. In such material, we find: (i) massive fiber tracts descending dorsoventrally roughly from where both Maseko et al. 2013 and we think the trigeminal tract runs. (ii) These fiber tracts run dorsoventrally and approach, what we think is the trigeminal nuclei from lateral.

      Changes: Ad hoc tract tracing see above.

      So what are these "bumps" in the elephant brainstem? 

      Four previous authors indicate that these bumps are the inferior olivary nuclear complex. Can this be supported?

      The inferior olivary nuclear complex acts "as a relay station between the spinal cord (n.b. trigeminal input does reach the spinal cord via the spinal trigeminal tract) and the cerebellum, integrating motor and sensory information to provide feedback and training to cerebellar neurons" (https://www.ncbi.nlm.nih.gov/books/NBK542242/). The inferior olivary nuclear complex is located dorsal and medial to the pyramidal tracts (which were not labeled in the current study by the authors but are clearly present in Fig. 1C and 2A) in the ventromedial aspect of the rostral medulla oblongata. This is precisely where previous authors have identified the inferior olivary nuclear complex and what the current authors assign to their putative trigeminal nuclei. The neurons of the inferior olivary nuclei project, via the olivocerebellar tract to the cerebellum to terminate in the climbing fibres of the cerebellar cortex.

      Comment: We agree with the referee that in the Maseko et al. (2013) scheme the inferior olive is exactly where we expect it from pretty much all other mammals. Hence, this is a strong argument in favor of the Maseko et al. (2013) scheme and a strong argument against the partitioning scheme suggested by us.

      Changes: Please see our discussion above.

      Elephants have the largest (relative and absolute) cerebellum of all mammals (10.1002/ar.22425), this cerebellum contains 257 x109 neurons (10.3389/fnana.2014.00046; three times more than the entire human brain, 10.3389/neuro.09.031.2009). Each of these neurons appears to be more structurally complex than the homologous neurons in other mammals (10.1159/000345565; 10.1007/s00429-010-0288-3). In the African elephant, the neurons of the inferior olivary nuclear complex are described by Maseko et al (2013) as being both calbindin and calretinin immunoreactive. Climbing fibres in the cerebellar cortex of the African elephant are clearly calretinin immunopositive and also are likely to contain calbindin (10.1159/000345565). Given this, would it be surprising that the inferior olivary nuclear complex of the elephant is enlarged enough to create a very distinct bump in exactly the same place where these nuclei are identified in other mammals? 

      Comment: We agree with the referee that it is possible and even expected from other mammals that there is an enlargement of the inferior olive in elephants. Hence, a priori one might expect the ventral brain stem bumps to the inferior olive, this is perfectly reasonable and is what was done by previous authors. The referee also refers to calbindin and calretinin antibody reactivity. Such antibody reactivity is indeed in line with the referee’s ideas and we considered these findings in our Referee Table 1. The problem is, however, that neither calbindin nor calretinin antibody reactivity are highly specific and indeed both nuclei in discussion (trigeminal nuclei and inferior olive) show such reactivity. Unlike the peripherin-antibody staining advanced by us, calbindin nor calretinin antibody reactivity cannot distinguish the two hypotheses debated.

      Changes: Please see our discussion above.

      What about the myelin stripes? These are most likely to be the origin of the olivocerebellar tract and probably only have a coincidental relationship with the trunk. Thus, given what we know, the inferior olivary nuclear complex as described in other studies, and the putative trigeminal nuclear complex as described in the current study, is the elephant inferior olivary nuclear complex. It is not what the authors believe it to be, and they do not provide any evidence that discounts the previous studies. The authors are quite simply put, wrong. All the speculations that flow from this major neuroanatomical error are therefore science fiction rather than useful additions to the scientific literature. 

      Comment: It is unlikely that the myelin stripes are the origin of the olivocerebellar tract as suggested by the referee. Specifically, the lack of peripherin-reactivity indicates that these fibers are not climbing fibers (our novel Figure 2). In general, we feel the referee does not want to discuss the myelin stripes and obviously thinks we made up the strange correspondence of myelin stripes and trunk folds.

      Changes: Please see our discussion above.

      What do the authors actually have? 

      The authors have interesting data, based on their Golgi staining and analysis, of the inferior olivary nuclear complex in the elephant.

      Comment: The referee reiterates their views.

      Changes: None.

      Reviewer #3 (Public Review):

      Summary: 

      The study claims to investigate trunk representations in elephant trigeminal nuclei located in the brainstem. The researchers identified large protrusions visible from the ventral surface of the brainstem, which they examined using a range of histological methods. However, this ventral location is usually where the inferior olivary complex is found, which challenges the author's assertions about the nucleus under analysis. They find that this brainstem nucleus of elephants contains repeating modules, with a focus on the anterior and largest unit which they define as the putative nucleus principalis trunk module of the trigeminal. The nucleus exhibits low neuron density, with glia outnumbering neurons significantly. The study also utilizes synchrotron X-ray phase contrast tomography to suggest that myelin-stripe-axons traverse this module. The analysis maps myelin-rich stripes in several specimens and concludes that based on their number and patterning they likely correspond with trunk folds; however, this conclusion is not well supported if the nucleus has been misidentified.

      Comment: The referee gives a concise summary of our findings. The referee acknowledges the depth of our analysis and also notes our cellular results. The referee – in line with the comments of Referee 2 – also points out that a misidentification of the nucleus under study is potentially fatal for our analysis. We thank the referee for this fair assessment.

      Changes: We feel that we need to alert the reader more broadly to the misidentification concern. We think the critical comments of Referee 2, which will be published along with our manuscript, will go a long way in doing so. We think the eLife publishing format is fantastic in this regard. We will also include pointers to these concerns in the revised manuscript.

      Strengths: 

      The strength of this research lies in its comprehensive use of various anatomical methods, including Nissl staining, myelin staining, Golgi staining, cytochrome oxidase labeling, and synchrotron X-ray phase contrast tomography. The inclusion of quantitative data on cell numbers and sizes, dendritic orientation and morphology, and blood vessel density across the nucleus adds a quantitative dimension. Furthermore, the research is commendable for its high-quality and abundant images and figures, effectively illustrating the anatomy under investigation.

      Comment: Again, a very fair and balanced set of comments. We are thankful for these comments.

      Changes: None.

      Weaknesses: 

      While the research provides potentially valuable insights if revised to focus on the structure that appears to be the inferior olivary nucleus, there are certain additional weaknesses that warrant further consideration. First, the suggestion that myelin stripes solely serve to separate sensory or motor modules rather than functioning as an "axonal supply system" lacks substantial support due to the absence of information about the neuronal origins and the termination targets of the axons. Postmortem fixed brain tissue limits the ability to trace full axon projections. While the study acknowledges these limitations, it is important to exercise caution in drawing conclusions about the precise role of myelin stripes without a more comprehensive understanding of their neural connections.

      Comment: The referee points out a significant weakness of our study, namely our limited understanding of the origin and targets of the axons constituting the myelin stripes. We are very much aware of this problem and this is also why we directed high-powered methodology like synchrotron X-ray tomograms to elucidate the structure of myelin stripes. Such analysis led to advances, i.e., we now think, what looks like stripes are bundles and we understand the constituting axons tend to transverse the module. Such advances are insufficient, however, to provide a clear picture of myelin stripe connectivity.

      Changes: We think solving the problems raised by the referee will require long-term methodological advances and hence we will not be able to solve these problems in the current revision. Our long-term plans for confronting these issues are the following: (i) Improving our understanding of long-range connectivity by post-mortem tracing and MR-based techniques such as Diffusion-Tensor-Imaging. (ii) Improving our understanding of mid and short-range connectivity by applying even larger synchrotron X-ray tomograms and possible serial EM.

      Second, the quantification presented in the study lacks comparison to other species or other relevant variables within the elephant specimens (i.e., whole brain or brainstem volume). The absence of comparative data for different species limits the ability to fully evaluate the significance of the findings. Comparative analyses could provide a broader context for understanding whether the observed features are unique to elephants or more common across species. This limitation in comparative data hinders a more comprehensive assessment of the implications of the research within the broader field of neuroanatomy. Furthermore, the quantitative comparisons between African and Asian elephant specimens should include some measure of overall brain size as a covariate in the analyses. Addressing these weaknesses would enable a richer interpretation of the study's findings.

      Comment: The referee suggests another series of topics, which include the analysis of brain parts volumes or overall brain size. We agree these are important issues, but we also think such questions are beyond the scope of our study.

      Changes: We hope to publish comparative data on elephant brain size and shape later this year.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I realize that elephant brains are a limiting resource in this project, along with the ability to perform functional investigations. However, I believe that Prof. Jon Kaas (Vanderbilt University) has one or more series of Nissl-stained brainstems from elephants. These might be of potential interest, as they were previously used to explore general patterns of trigeminal brainstem organization in a comparative manner (see Sawyer and Sarko, 2017, "Comparative Anatomy and Evolution of the Somatosensory Brain Stem" in the Evolution of Nervous System series) and might shed light on the positioning of the trigeminal complex and IO, with parts of the trigeminal nerve itself still attached to these sections.

      Comment: The referee suggests adding data from more elephants and we think this is a great suggestion because our ns are small. We followed this advice. We agree we need more comparative neuroanatomy of elephants and the urgency of this matter is palpable in the heated debate we have with Referee 2. Specifically, we need more long-range and short-range analysis of elephant brains.

      Changes: We plan to include data in the revised manuscript about cytoarchitectonics (Nissl), cytochrome-oxidase reactivity, and possibly also antibody reactivity from an additional animal, i.e., from the African elephant cow Bibi. The quality of this specimen is excellent and the post-mortem time to brain extraction was very short.

      We also have further plans for connectivity analysis (see our response above), but such data will not become available fast enough for the revision.

      Other recommendations: 

      - A general schematic showing input from trunk to PrV to the trigeminal subnuclei (as well as possibly ascending connections) might be informative to the reader, in terms of showing which neural relay is being examined.

      Comment: We think this is a very good suggestion in principle, but we were not satisfied with the schematics we came up with.

      Changes: None.

      - Perhaps a few more sentences described the significance of synchrotron tomography for those who may be unfamiliar.

      Comment & Change: We agree and implement this suggestion.

      - "Belly-shaped" trunk module description is unclear on page 9. 

      Comment & Change: We clarified this matter.

      - Typo on the last sentence of page 9. 

      Comment & Change: We fixed this mistake.

      Reviewer #2 (Recommendations For The Authors): 

      The data is only appropriate a specialized journal and is limited to the Golgi analysis of neurons within the inferior olivary complex of the elephant. This reviewer considers that the remainder of the work is speculation and that the paper in its current version is not salvageable.

      Comment: Rather than suggesting changes, the referee makes it clear that the referee does not want to see our paper published. We think this desire to reject is not rooted in a lack of quality of our work. In fact, we did an immense amount of work (detailed cytoarchitectonic analysis of six (now seven) elephant brainstems rather than one as in the case of our predecessors), cell counts, and X-ray tomography. Instead, we think the problem is rooted in the fact that we contradict the referee. To us, such suppression of diverging opinions – provided they are backed up with data – is a scientifically deeply unhealthy attitude. Science lives from the debate and this is why we did not exclude any referees even though we knew that our results do not align with the views of all of the few actors in the field.

      Changes: We think the novel eLife publishing scheme was developed to prevent such abuse. We look forward to having our data published along with the harsh comments of the referee. The readers and subsequent scientific work will determine who’s right and who’s wrong.

      In order to convince readers of the grand changes to the organization of the brainstem in a species suggested by the authors the data presented needs to be supported. It is not. 

      Comment: Again, this looks to us like more of the ‘total-rejection-commentary’ than like an actual recommendation.

      Changes: None.

      The protocol for the cytochrome oxidase histochemistry is not available in the locations indicated by the authors, and it is very necessary to provide this, as I fully believe that the staining obtained is not real, given the state of the tissue used. 

      Comment: We apologize again for not including the necessary details on our cytochrome-oxidase staining.

      From these comments (and the initial comments above) it appears that the referee is uncertain about the validity of cytochrome-oxidase staining. We (M.B., the senior author) have been doing this particular stain for approximately three decades. The referee being unfamiliar with cytochrome-oxidase staining is fine, but we can’t comprehend how the referee then comes to the ‘full belief’ that our staining patterns are ‘not real’ when the visual evidence indicates the opposite. We feel the referee does not want to believe our data.

      From hundreds of permutations, we can assure the referee that cytochrome-oxidase staining can go wrong in many ways. The most common failure outcome in elephants is a uniform light brown stain after hours or days of the cytochrome-oxidase reaction. This outcome is closely associated with long ≥2 days post-mortem/fixation times and reflects the quenching of cytochrome-oxidases by fixation. Interestingly, cytochrome-oxidase staining in elephant brains is distinctly more sensitive to quenching by fixation than cytochrome-oxidase staining in rodent brains. Another, more rare failure of cytochrome-oxidase staining comes as entirely white or barely colored sections; this outcome is usually associated with a bad reagent (most commonly old DAB, but occasionally also old or bad catalase, in case you are using a staining protocol with catalase). Another nasty cytochrome-oxidase staining outcome is smeary all-black sections. In this case, a black precipitate sticks to sections and screws up the staining (filtering and more gradual heating of the staining solution usually solve this problem). Thus, you can get uniformly white, uniformly light brown, and smeary black sections as cytochrome-oxidase staining failures. What you never get from cytochrome-oxidase staining as an artifact are sections with a strong brown to lighter brown differential contrast. All sections with strong brown to lighter brown differential contrast (staining successes) show one and the same staining pattern in a given brain area, i.e., brownish barrels in the rodent cortex, brownish barrelettes (trigeminal nuclei) in the rodent brainstem, brownish putative trunk modules/inferior olives (if we believe the referee) in the elephant brainstem. Cytochrome-oxidase reactivity is in this regard remarkably different from antibody staining. In antibody staining you can get all kinds of interesting differential contrast staining patterns, which mean nothing. Such differential contrast artifacts in antibody staining arise as a result of insufficient primary antibody specificity, the secondary antibody binding non-specifically, and of what have you not reasons. The reason that the brown differential contrast of cytochrome-oxidase reaction is pretty much fool-proof, relates to the histochemical staining mechanism, which is based on the supply of specific substrates to a universal mitochondrial enzyme. The ability to reveal mitochondrial metabolism and the universal and ‘fool-proof’ staining qualities make the cytochrome-oxidase reactivity a fantastic tool for comparative neuroscience, where you always struggle with insufficient information about antigen reactivity.

      We also note that the contrast of cytochrome-oxidase reactivity seen in the elephant brainstem is spectacular. As the Referee can see in our Figure 1C we observe a dark brown color in the putative trunk module, with the rest of the brain being close to white. Such striking cytochrome-oxidase reactivity contrast has been observed only very rarely in neuroanatomy: (i) In the rest of the elephant brain (brainstem, thalamus cortex) we did not observe as striking contrast as in the putative trunk module (the inferior olive according to the referee). (ii) In decades of work with rodents, we have rarely seen such differential activity. For example, cortical whisker-barrels (a classic CO-staining target) in rodents usually come out as dark brown against a light brown background.

      What all of this commentary means is that patterns revealed by differential cytochrome-oxidase staining in the elephant brain stem are real.

      Changes: We added details on our cytochrome-oxidase reactivity staining protocol and commented on cytochrome-oxidase reactivity in the elephant brain in general.

      The authors need to recognize that the work done in Africa on elephant brains is of high quality and should not be blithely dismissed by the authors - this stinks of past colonial "glory", especially as the primary author on these papers is an African female.

      Comment: The referee notes that we unfairly dismiss the work of African scientists and that our paper reflects a continuation of our horrific colonial past because we contradict the work of an African woman. We think such commentary is meant to be insulting and prefer to return to the scientific discourse. We are staunch supporters of diversity in science. It is simply untrue, that we do not acknowledge African scientists or the excellent work done in Africa on elephant brains. For example, we cite no less than four papers from the Manger group. We refer countless times in the manuscript to these papers, because these papers are highly relevant to our work. We indeed disagree with two anatomical assignments made by Maseko et al., 2013. Such differences should not be overrated, however. As we noted before, such differences relate to only 2 out of 147 anatomical assignments made by these authors. More generally, discussing and even contradicting papers is the appropriate way to acknowledge scientists. We already expressed we greatly admire the pioneering work of the Manger group. In our view, the perfusion of elephants in the field is a landmark experiment in comparative neuroanatomy. We closely work with colleagues in Africa and find them fantastic collaborators. When the referee is accusing us of contradicting the work of an African woman, the referee is unfairly and wrongly accusing us of attacking a scientist’s identity. More generally, we feel the discussion should focus on the data presented.

      Changes: None.

      In addition, perfusing elephants in the field with paraformaldehyde shortly after death is not a problem "partially solved" when it comes to collecting elephant tissue (n.b., with the right tools the brain of the elephant can be removed in under 2 hours). It means the problem IS solved. This is evidenced by the quality of the basic anatomical, immuno-, and Golgi-staining of the elephant tissue collected in Africa.

      Comment: This is not a recommendation. We repeat: In our view, the perfusion of elephants in the field by the Manger group is a landmark experiment in comparative neuroanatomy. Apart, from that, we think the referee got our ‘partially solved comment’ the wrong way. It is perhaps worthwhile to recall the context of this quote. We first describe the numerous limitations of our elephant material; admitting these limitations is about honesty. Then, we wanted to acknowledge previous authors who either paved the way for elephant neuroanatomy (Shoshani) or did a better job than we did (Manger; see the above landmark experiment). These citations were meant as an appreciation of our predecessors’ work and by far not meant to diminish their work. Why did we say that the problems of dealing with elephant material are only partially solved? Because elephant neuroanatomy is hard and the problems associated with it are by no means solved. Many previous studies rely on single specimen and our possibilities of accessing, removing, processing, and preserving elephant brains are limited and inferior to the conditions elsewhere. Doing a mouse brain is orders of magnitude easier than doing an elephant brain (because the problems of doing mouse anatomy are largely solved), yet it is hard to publish a paper with six elephant brains because the referees expect evidence at least half as good as what you get in mice.

      Changes: We replaced the ‘partially solved’ sentence.

      The authors need to give credit where credit is due - the elephant cerebellum is clearly at the core of controlling trunk movement, and as much as primary sensory and final stage motor processing is important, the complexity required for the neural programs needed to move the trunk either voluntarily or in response to stimuli, is being achieved by the cerebellum. The inferior olive is part of this circuit and is accordingly larger than one would expect.

      Comment: We think it is very much possible that the elephant cerebellum is important in trunk control.

      Changes: We added a reference to the elephant cerebellum in the introduction of our manuscript.

    1. Author response:

      Thank you for organising the review and providing us with the reviewer's feedback. These comments are very useful, and we would like to express our gratitude to the reviewers for their efforts.

      The reviewers all point out a number of related improvements, relating to: 1) describing various processing steps more clearly, in the online documentation but also in the manuscript itself (e.g. for particle picking), 2) describing more clearly what features Ais offers, how these compare to those of other programmes, and how they might be interfaced with in third-party programmes (e.g. the expected format of models), and 3) a degree of subjectivity in discussion of the results presented in the manuscript (e.g. our statement that Pix2pix performed better in some cases than did other architectures).

      We will address these points, as well as the various other suggestions, in the upcoming revised manuscript and updates to Ais.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Perampalam et al. describe novel methods for genome-wide CRISPR screening to identify and validate genes essential for HGSOC spheroid viability. In this study, they report that Netrin signaling is essential for maintaining disseminated cancer spheroid survival, wherein overexpression of Netrin pathway genes increases tumor burden in a xenograft model of ovarian cancer. They also show that high netrin expression correlates with poor survival outcomes in ovarian cancer patients. The study provides insights into the biology of netrin signaling in DTC cluster survival and warrants development of therapies to block netrin signaling for treating serous ovarian cancer.

      Strengths:

      - The study identifies Netrin signaling to be important in disseminated cancer spheroid survival

      - A Novel GO-CRISPR methodology was used to find key genes and pathways essential for disseminated cancer cell survival

      Thanks for the endorsement of our work and its importance to metastasis in ovarian cancer.

      Weaknesses:

      - The term dormancy is not fully validated and requires additional confirmation to claim the importance of Netrin signaling in "dormant" cancer survival.

      - Findings shown in the study largely relate to cancer dissemination and DTS survival rather than cancer dormancy.

      Much of the validation of dormancy and cell cycle arrest in HGSOC spheroids, as well as the culture model, have been published previously and hence was not repeated here.  I think this reviewer will appreciate the updated citations and explanations to better illustrate the state of knowledge.  We have also added new experiments that further emphasize the dormant state of spheroid cells in culture and xenografts, as well as patient derived spheroids used in this study.

      Reviewer #1 (Recommendations for Authors):

      (1) It is unclear what spheroid/adherent enrichment ratio is and how it ties into genes affecting cell viability. Why is an ER below 1 the criteria for selecting survival genes?

      Our screen uses the ‘guide only’ comparison in each culture condition to establish a gene score under that specific condition.  A low adherent score captures genes that are essential under standard culture conditions where cells are proliferating and this can include genes needed for proliferation or other basic functions in cell physiology.  A low spheroid score identifies the genes that are most depleted in suspension when cells are growth arrested and this is an indication of cell death in this condition.  Since gene knock outs are first established in adherent proliferating conditions, essential genes under these conditions will already start to become depleted from the population before suspension culture.  By selecting genes with a ratio of <1 we can identify those that are most relevant to dormant suspension culture conditions.  Ultimately, the lowest enrichment ratio scores represent genes whose loss of function is dispensable in the initial adherent condition, but critical for survival in suspension and this is what we aimed to identify. We’ve updated Figure 1B to illustrate this and we’ve updated the explanation of the enrichment ratio on page 6, lines 144 to 147 of the results.

      (2) The WB for phospho-p38 in figure 1A for OVCAR8 line does not show increased phosphorylation in the spheroid relative to the adherent. If anything, phospho-p38 appears to be reduced in the spheroid. Can the authors provide a better western blot?

      We’ve updated this blot with a longer exposure, see Figure 1A.  Phosphorylation levels of p38 are essentially unchanged in OVCAR8 cells in suspension culture, although the overall levels of p38 may be slightly reduced in dormant culture conditions.

      (3) How did the authors confirm dormancy apart from western blot for phospho-ERK vs phospho-p38? Authors should add EdU/BrdU staining and/or Ki67 staining to confirm dormancy.

      Previous publications that appear as citations 7,10, and 33 in the reference list established the growth arrest state of these cells in suspension culture in the past.  This included measuring other known markers of dormancy and quiescence such as p27, p130, and reduced cyclin/cdk activity and 3H-thymidine incorporation. In addition, other associated characteristics of dormancy such as EMT and catabolic metabolism have been demonstrated in these culture conditions (see citation 11 and Rafehi et al. Endocr. Relat. Cancer 23;147-59).  We’ve added these additional citations to our descriptions of dormant spheroid culture to better clarify the status of these cells in our experiments (see page 6, lines 126-28).  To ensure that cells are growth arrested in the experiments shown in this paper, we have updated Figure 1A to include blots of p130 and Ki67 to further emphasize that spheroid cells are not proliferating as the quiescence marker (p130) is high and the proliferative marker (Ki67) is lost in suspension culture.

      (4) Can the authors report spheroid volume over time in culture? How was viability measured?

      We’ve updated the methods (see page 27, line 574) to better highlight the description of cell survival that answers both of these questions. At the ends of experimental time points in both the screen and viability assays we captured live cells by replating on adherent plasticware. We fixed and stained with crystal violet and photographed plates to illustrate the sizes of spheroids (shown in Fig. 2 Supplement 1E, Fig. 6C, and 7D). We subsequently extracted the dye and quantitated it spectrophotometrically to quantitatively compare biomass of viable cells between experiments irrespective of the relatively random shapes of spheroids. We found reattachment and staining in this manner to match traditional viability assays such as CellTiter-Glo in a previous paper (10). Furthermore, biomass never increases in culture and diminishes gradually over time in culture consistent with the non-proliferative state of these experiments. Double checks of this equivalency of viability and reattached biomass measurments, as well as demonstrating that biomass is lost over time, are shown in Fig. 2 Supplement 1E that compares reattached crystal violet staining measurements with CellTiter-Glo for DYRK1A knock out cells over time in culture. In addition, we include a comparison of crystal violet staining of reattached spheroids with trypan blue dye exclusion in Fig. 5G and H. In both cases reattachment and more direct viability assays demonstrate the same conclusion that Netrin signaling supports viability in dormant culture.

      (5) Please show survival significance of Netrin signaling genes in recurrence/relapse free survival to claim importance in cancer dormancy.

      See Fig. 7 Supplement 1C where we include the recurrence free survival data. Netrin-1, and -3 high expressors also have a numerically shorter progression free survival but it is not statistically significant. Netrin-1 overexpression alone is also shown and it shows shorter survival with a P-value of 0.0735. Elevated survival of dormant cells in a residual disease state is expected to increase the chance of relapse and shorten this interval. Thus, this data is consistent with our model, but lacks statistical significance. 

      There are many alternative ways to interpret what shorter progression free survival, or overall survival, may mean biologically. Since survival of dormant cells is but one of them, we also added new data to experimentally investigate the role of endogenous Netrin signaling in dormant residual disease in Fig. 6 and described on page 12, lines 266-87.  We used xenograft experiments to show OVCAR8 spheroids form and withdraw from the cell cycle equivalently to suspension culture following intraperitoneal injection.  Furthermore, loss of Netrin signaling due to receptor deletions compromises survival during this early window before disseminated lesions form.  This argues that Netrin signaling contributes to survival during this window of dormancy.  In addition, mice engrafted with mutant cells experience prolonged survival when Netrin signaling is blocked.  Together, these experiments further argue that Netrin signaling supports survival in the dormant, non-proliferative phase, and leads to reduced survival of mice.

      (6) The authors show IHC staining of patient ascites derived HGSOC spheroids. However, no marker for dormancy is shown in these spheroids. Adding Ki67 staining or phospho-ERK vs phospho-p38 would be necessary to confirm cancer dormancy.

      We have added new staining for Ki67 and p130 that compares these markers in HGSOC tumors where Ki67 is high and p130 is low with ascites derived spheroids where staining is the opposite. Importantly, expression of p130 is linked to cellular quiescence and is not found to accumulate in the nucleus of cells that are just transiting through G1.  This confirms that the ascites derived spheroids are dormant.  See Fig. 4A-E and described on page 9, lines 201-7.

      (7) Overall, the findings are interesting in the context of cancer dissemination. There is not enough evidence for cancer dormancy and the importance of Netrin signaling in the survival of cancer dormancy. Overexpression of Netrin increases phosphorylation of ERK, leading one to expect an increase in proliferation. This suggests that Netrin breaks cancer cells out of dormancy, into a proliferative state.

      We have found that the discovery of Netrin activation of MEK-ERK in growth arrested cells is counterintuitive to many cancer researchers.  However, this axis exists in other paradigms of Netrin signaling in axon outgrowth that are not proliferation related (see citation 26, Forcet et al. Nature 417; 443-7 as an example).  We have added Fig. 5D and descriptions on page 11, lines 244-52 to better clarify that Netrins CAN’T induce cell proliferation through ERK.  Addition of recombinant Netrin-1 can only induce ERK phosphorylation in suspension culture conditions and not in quiescent adherent conditions.  The small magnitude of ERK phosphorylation induced by Netrin-1 in suspension compared to treating adherent, quiescent cells with the same concentration of mitogenic EGF further emphasizes that this is not a proliferative signal.  Lastly, the new xenograft experiment in Fig. 6A-D (described on page 12, lines 266-81 demonstrates the growth arrested context in which Netrin signaling in dormant spheroids leads supports viability.

      (8) If authors wish to claim cancer dormancy as the premise of their study, additional confirmatory experiments are required to support their claims. Alternatively, based on the current findings of the study, it would be best to change the premise of the article to Netrin signaling in cancer dissemination and survival of disseminated cancer spheroids rather than cancer dormancy.

      I expect that this reviewer will agree that we have added more than sufficient explanations of background work on HGSOC spheroid dormancy from the literature, as well as new experiments that address their questions about dormancy in our experiments.

      Reviewer #2 (Public Review):

      Summary:

      In this article, the authors employed modified CRISPR screens ["guide-only (GO)-CRISPR"] in the attempt to identify the genes which may mediate cancer cell dormancy in the high grade serous ovarian cancer (HGSOC) spheroid culture models. Using this approach, they observed that abrogation of several of the components of the netrin (e.g., DCC, UNC5Hs) and MAPK pathways compromise the survival of non-proliferative ovarian cancer cells. This strategy was complemented by the RNAseq approach which revealed that a number of the components of the netrin pathway are upregulated in non-proliferative ovarian cancer cells and that their overexpression is lost upon disruption of DYRK1A kinase that has been previously demonstrated to play a major role in survival of these cells. Perampalam et al. then employed a battery of cell biology approaches to support the model whereby the Netrin signaling governs the MEK-ERK axis to support survival of non-proliferative ovarian cancer cells. Moreover, the authors show that overexpression of Netrins 1 and 3 bolsters dissemination of ovarian cancer cells in the xenograft mouse model, while also providing evidence that high levels of the aforementioned factors are associated with poor prognosis of HGSOC patients.

      Strengths:

      Overall it was thought that this study is of potentially broad interest in as much as it provides previously unappreciated insights into the potential molecular underpinnings of cancer cell dormancy, which has been associated with therapy resistance, disease dissemination, and relapse as well as poor prognosis. Notwithstanding the potential limitations of cellular models in mimicking cancer cell dormancy, it was thought that the authors provided sufficient support for their model that netrin signaling drives survival of non-proliferating ovarian cancer cells and their dissemination. Collectively, it was thought that these findings hold a promise to significantly contribute to the understanding of the molecular mechanisms of cancer cell dormancy and in the long term may provide a molecular basis to address this emerging major issue in the clinical practice.

      Thanks for the kind words about the importance of our work in the broader challenges of cancer treatment.

      Weaknesses:

      Several issues were observed regarding methodology and data interpretation. The major concerns were related to the reliability of modelling cancer cell dormancy. To this end, it was relatively hard to appreciate how the employed spheroid model allows to distinguish between dormant and e.g., quiescent or even senescent cells. This was in contrast to solid evidence that netrin signaling stimulates abdominal dissemination of ovarian cancer cells in the mouse xenograft and their survival in organoid culture. Moreover, the role of ERK in mediating the effects of netrin signaling in the context of the survival of non-proliferative ovarian cancer cells was found to be somewhat underdeveloped.

      Experiments previously published in citation 7 show that growth arrest in patient ascites derived spheroids is fully reversible and that argued against non-proliferative spheroids being a form of senescence and moved this work into the dormancy field.  We have added extensive new support for our model systems and data to address the counterintuitive aspects of MEK-ERK signaling in survival instead of proliferation. 

      Reviewer #1 Recommendations for Authors

      (1) A better characterization of the spheroid model may be warranted, including staining for the markers of quiescence and senescence (including combining these markers with staining for the components of the netrin pathway)

      See Figure 1A and page 6, lines 126-36 where we have added blots for Ki67 and p130 to better emphasize the arrested proliferative state of cells in our screening conditions.  We have also added these same controls for patient ascites-derived spheroids in Figure 4 and described on page 9, lines 203-7.  One realization from this CRISPR screen, and others in our lab, is that it identifies functionally important aspects of cell physiology and not necessarily ones that are easily explored using commercially available antibodies.  Netrin-1 and -3 staining of patient derived spheroids in Fig. 4, as well as cell line spheroids stained in Fig. 4 Supplement 1 further support the relevance of this pathway in dormant cancer cells because Netrins are expressed in the right place at the right time.  The Netrin-1 stimulation experiments in Fig. 5C were originally carried out to probe HGSOC cells for functionality of Netrin receptors since we couldn’t reliably detected them by blotting or staining with available antibodies.  This demonstrates that this pathway is active in the various HGSOC cell lines we’ve used and specifically, using OVCAR8 cells, we show it is only active in suspension culture conditions.

      (2) In figure 1A it appears that total p38 levels are reduced in some cell lines in spheroid vs. adherent culture. The authors should comment on this.

      These blots have been updated to be more clear.  Overall p38 levels may be reduced in some cell lines and when compared with activation levels of phosphorylated p38 it suggests the fraction of activated p38 is higher. OVCAR8 cells may be an exception where the overall activity level remains approximately the same.

      (3) The authors should perhaps provide a clearer rationale for choosing to focus on the netrin signaling vs. e.g., GPCR signaling, and consider more explicit defining of "primary" vs. "tertiary" categories in Reactome gene set analysis.

      We’ve updated Fig. 1E and the text on page7, lines 161-5 to illustrate which gene categories identified in the screen belong to which tiers of Reactome categories. It better visualizes why we have investigated the Axon guidance pathway that includes Netrin because it is a highly specific signaling pathway that scores similarly to the broader and less specific categories at the very top of the list. As an aside, the GPCR signaling and GPCR downstream signaling have proven to be fairly intractable categories.  As best we can tell the GPCR downstream signaling category is full of MAPK family members and likely represents some redundancy with MAPK further down.  

      (4) In figure 3A-C, including factors whose expression did not appear to change between adherent and suspension conditions may be warranted as the internal control. Figure 3D-F may benefit from some sort of quantification.

      The mRNA expression levels are normalized to GAPDH as an internal control. We have updated this figure and re-plotted it as fold change relative to adherent culture cells with statistical comparisons to indicate which are significantly upregulated in suspension culture.

      The IHC experiments are now in Fig. 4D-F and show positive staining for Netrin-1 and -3.  Netrin-3 is easiest to see, while Netrin-1 is trickier because the difference with the no primary antibody control isn’t intensity, but the tint of the DAB stain.  We had to counter stain the patient spheroids with Hematoxylin in order for the slide scanner to find the best focal plane and make image registration between sections possible.  This unfortunately makes the Netrin-1 staining rather subtle.  For cell line spheroids in the Fig. 4, Supplement 1 we didn’t need the slide scanner and show negative controls without counter stain that are much more convincing of Netrin-1 detection and reassure us that our staining detects the intended target.  We’ve updated the labels in Fig. 4 and Fig. 4, Supplement 1 for this to be more intuitive.  Unfortunately, relying on the tint of the DAB stain leaves this as a qualitative experiment.

      - In figure 4C-E the authors show that Netrin-1 stimulation induces ERK phosphorylation whereby it is argued that this is a "low-level" stimulation of ERK signaling required for the survival of ovarian cells in the suspension. This is however hard to appreciate, and it was thought that having adherent cells in parallel would be helpful to wage whether this indeed is a "low level" ERK activity. Moreover, the authors should likely include downstream substrates of ERK (e.g., RSKs) as well as p38 in these experiments. The control experiments for the effects of PD184352 on ERK phosphorylation also appear to be warranted. Finally, performing the experiments with PD184352 in the presence of Netrin-1 stimulation would also be advantageous.

      We have added a new Netrin-1 stimulation experiment in Fig. 4D (described on page 11, line 244-52) that shows that Netrins can only activate  very low levels of ERK phosphorylation in suspension when proliferation is arrested. Netrin-1 stimulation of quiescent adherent cells where stimulation of proliferation is possible shows that Netrins are unable to activate ERK phosphorylation in this condition.  In contrast, we also stimulate quiescent adherent OVCAR8 cells with an equal concentration of EGF (a known mitogen) to offer high level ERK phosphorylation as a side by side comparison.  I think that this offers clear evidence that Netrin signaling is inconsistent with inducing cell proliferation.  We’ve also updated citations in the introduction to include citation 26 that offers a previously reported paradigm of Netrin-ERK signaling in axon outgrowth that is a non-cancer, non-proliferative context to remind readers that Netrins utilize MEK-ERK differently. 

      We highlight Netrin-MEK-ERK signaling as key to survival for a number of reasons.  First, Netrin signaling in this paradigm does not fit the dependence receptor paradigm where loss of Netrin receptors protect against cell death.  Fig. 5B rules this out as receptor loss never offers a survival advantage, but clearly receptor deletions compromise survival in suspension culture.  Second, positive Netrin signaling is known to support survival by inactivating phosphorylation of DAPK1.  We’ve added this experiment as Fig. 5 Supplement 1D and show that loss of Netrin receptors doesn’t reduce DAPK1 phosphorylation in a time course of suspension culture.  Consequently, we conclude this isn’t the survival signal either.  Since MEK and ERK family members scored in our screen, we investigated their role in survival.  We now show two different MEK inhibitors with different inhibitory mechanisms to confirm that MEK inhibition induces cell death. In addition to the previous PD184352 inhibitor in our first submission, we’ve added Trametinib as well and this is shown in Fig. 5G.  Since it is surprising the MEK inhibition can kill instead of just arrest proliferation, we’ve also added another cell death assay in which we show trypan blue dye exclusion as a second look at survival.  This is now Fig. 5H.  Lastly, we include Trametinib inhibition of ERK phosphorylation in these assays in Fig. 5I.  While we leave open what takes place downstream of ERK, our model in Fig. 5J offers a very detailed look at the components upstream.

      - Does inhibition of ERK prevent the abdominal spread of ovarian cancer cells? The authors may feel that this is out of the scope of the study, which I would agree with, but then the claims regarding ERK being the major mediator of the effects of netrin signaling should be perhaps slightly toned down.

      We agree that loss of function xenograft experiments will enhance our discovery of Netrin’s role in dormancy and metastasis.  We have added a new Fig. 6 that uses xenografts with Netrin receptor deficient OVCAR8 cells (UNC5 4KO).  It demonstrates that two weeks following IP engraftment we can isolate spheroids from abdominal washes and that cells have entered a state of reduced proliferation as determined by lowered Ki67 expression as well as other proliferation inducing genes.  In the case of UNC5 4KO cells, there is significant attrition of these cells as determined by recovering spheroids in adherent culture (Fig.6C) and by Alu PCR to detect human cells in abdominal washes (Fig. 6D).  Lastly, xenografts of UNC5 4KO cells cause much less aggressive disease and significantly extend survival of these mice (Fig. 6E,F).  Not exactly the experiment that the reviewer is asking for, but a clear indication that Netrin signaling supports survival in xenograft model of dormancy.

      - Notwithstanding that this could be deduced from figures 6D and F, it would be helpful if the number of mice used in each experimental group is clearly annotated in the corresponding figure legends. Moreover, indicating the precise statistical tests that were used in the figures would be helpful (e.g., specifying whether anova is one-way, two-way, or?)

      We have added labels to what is now Fig. 8B to indicate the number of animals used for each genotype of cells.  We have also updated figure legends to include more details of statistical tests used in each instance.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Roy et al. used the previously published deep transfer learning tool, DEGAS, to map disease associations onto single-cell RNA-seq data from bulk expression data. The authors performed independent runs of DEGAS using T2D or obesity status and identified distinct β-cell subpopulations. β-cells with high obese-DEGAS scores contained two subpopulations derived largely from either non-diabetic or T2D donors. Finally, immunostaining using human pancreas sections from healthy and T2D donors validated the heterogeneous expression and depletion of DLK1 in T2D islets.

      Strengths:

      (1) This meta-analysis of previously published scRNA-seq data using a deep transfer learning tool.

      (2) Identification of novel beta cell subclusters.

      (3) Identified a relatively innovative role of DLK1 in T2D disease progression.

      We thank the reviewer for their constructive critiques and positive feedback. We hope to further apply deep transfer learning tools in future scRNA-seq meta-analyses.

      Weaknesses:

      (1) There is little overlap of the DE list of bulk RNA-seq analysis in Figure 1D and 1E overlap with the DE list of pseudo-bulk RNA-seq analysis of all cells in Figure S2C.

      We thank the reviewer for this insightful thought and plan to perform additional analyses and comparisons to address this comment.

      (2) The biological meaning of "beta cells had the lowest scores compared to other cell types" is not clear.

      We agree with the reviewer and will amend this statement to clarify in the revised manuscript. In summary, the relatively lower T2D-DEGAS scores for beta cells overall compared to all other cell types (alpha cells, acinar cells, etc) reflects the fact that in T2D, beta cell-specific genes can be downregulated. This is also possibly due to beta cell loss in T2D and would be reflected in bulk islet RNAseq data. This affects the DEGAS model which is reflected in the scores of all cells in the scRNA-seq data (Fig 3A). For this reason, subsetting the beta cells and replotting them on their own (Fig 4B) is an important step to identify relative differences in DEGAS scores between different subsets of beta cells.

      (3) The figures and supplemental figures were not cited following the sequence, which makes the manuscript very difficult to read. Some supplemental figures, such as Figures S1C-S1D, S2B-S2E, S3A-S3B, were not cited or mentioned in the text.

      We apologize and thank the reviewer for pointing out these errors. All of the annotated errors will be amended in the revised manuscript.

      (4) In Figure 7, the current resolution is too low to determine the localization of DLK1.

      We will include the original highest-resolution confocal images in our resubmission. We will also improve the color combination to improve visibility of colocalization of DLK1 with Insulin.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Gitanjali Roy et al. applies deep transfer learning (DEGAS) to assign patient-level disease attributes (metadata) to single cells of T2D and non-diabetic patients, including obese patients. This led to the identification of a singular cluster of T2D-associated β-cells; and two subpopulations of obese- β-cells derived from either non-diabetic or T2D donors. The objective was to identify novel and established genes implicated in T2D and obesity. Their final goal is to validate their findings at the protein level using immunohistochemistry of pancreas tissue from non-diabetic and T2D organ donors.

      Strengths:

      This paper is well-written, and the findings are relevant for β-cell heterogeneity in T2D and obesity.

      We thank the reviewer for their constructive critiques and positive feedback. We believe this study can improve our understanding β-cell heterogeneity in the context of T2D and obesity.

      Weaknesses:

      The validation they provide is not sufficiently strong: no DLK1 immunohistochemistry is shown of obese patient-derived sections. Additional presumptive relevant candidates from this transcriptomic analysis should be screened for, at the protein level.

      Thank the reviewer for this suggestion. We are planning to perform new immunostaining of DLK1 in human pancreas tissue sections from non-diabetic lean, non-diabetic obese, T2D lean, and T2D obese donors. We also note that Table S6 contains the patient metadata for the pancreas samples we show in the current manuscript. Two of the T2D donors have BMI > 30 (obese). However, the non-diabetic donors have BMI between 26-29. Our new planned studies should address the question of differential DLK1 expression / beta cell heterogeneity in the context of both diabetes and obesity.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review): 

      Summary:

      The authors demonstrate that the immunosuppressive environment in pancreatic ductal adenocarcinoma (PDAC) can be mitigated by a combination of ionizing radiation (IR), CCR5 inhibition, and PD1 blockade. This combination therapy increases tissue-resident natural killer (trNK) cells that facilitate CD8 T cell activity, resulting in a reduction of E-cadherin positive tumor cells. They identify a specific "hypofunctional" NK cell population in both mouse and human PDAC that supports CD8 T cell involvement. A trNK signature is found to be associated with better survival outcomes in PDAC and other solid tumors.   

      Strengths: 

      Overall, I think this is an interesting study that combines testing of therapeutic concepts in mice with bioinformatics analysis of single-cell transcriptome data in primary tumors and exploration of clinical outcomes using signature genes in TCGA data. The key finding is that immunoregulatory properties of tumor-infiltrating/resident CD56-bright NK cells (assumed to be non-cytotoxic) are beneficial for outcome through cross-talk with DC and recruitment of CD8 T cells. The latter is specifically induced by irradiation combined with CCR5i and PD1 blockade. 

      "These results collectively support the notion that IR/CCR5i/αPD1 combination treatment alters immune infiltration by reducing Tregs and increasing NK and CD8 T cells, thereby resulting in greater local tumor control." I agree with this conclusion.  

      Weaknesses:  

      There are a few points to discuss and that the authors may want to address. 

      (1)   "Notably, CCR5i significantly reduced Treg infiltration but had no effect on the infiltration of other immune cells, indicating the active recruitment of CCR5+ Tregs in PDAC (Figure 2B)." 

      CCR5i treatment seems to inhibit infiltration of CD8 T cells and NK cells to a greater extent, in relative terms, compared to Treg, albeit it is not statistically significant. If this visual inspection of the graph does not reflect reality, additional experiments may be needed to verify the selective targeting of Tregs or confirm the fact that also CD8 T cells and NK cells are affected by single agent CCR5i. The reduced recruitment of Treg, NK cells, and CD8T cells was completely reversed when combined with irradiation. In the data shown in Figure 3E it seems as if CCR5i induced infiltration of Tregs along with other immune cells. However, this said, I agree with the conclusion of the authors that this combined treatment leads to an altered immune composition and ratio between Tregs and effector cells (CD8T cells and NK cells). Could this altered composition be displayed more clearly? 

      We would like to thank the reviewer for their comments and agree that there is a trend for reduced NK and T-cell infiltration during CCR5i standalone treatment (as seen in Figure 2B), although it does not reach significance. To reflect this more clearly, we have added n.s (non-significant) for the NK cells and CD8+ T-cells and adjusted the text to reflect a trend for decreased NK and CD8+ T-cell infiltration (See Lines 162-165). Moreover, to reflect the data accurately, we have taken the Treg data out of the original Figure 2B and present it separately as a percentage of CD45+CD3+ T-cells.

      (2) The definition of active and hypofunctional NK cells based on solely NKG2D expression alone seems like an oversimplification. I realize it is not trivial to test tumor-infiltrating NK cells from these tumors functionally but perhaps scRNAseq of the tumors would allow for characterization of cytotoxicity scores using KEGG or GO analysis or reversed gene set enrichment in responders/non-responders.  

      We agree that scRNA-seq of tumors would add to the overall characterization of the tumor-infiltrating NK cells and their characterization, however we are currently unfortunately not in the position to carry out this experiment. We did however immunophenotype the tumor infiltrating NK cell population in more depth by also looking at NKp46 and NKG2D surface expression. This newly added data demonstrates not only increased infiltration of “bona-fide” trNK cells (based on surface expression of CD103+CD49a+) under the triple treatment combination, but more importantly these trNK have reduced levels of CD69, NKp46, NKG2D and increased TIM-3 surface expression compared to conventional NK cells – suggesting that these trNKs could be more hypoactive compared to the conventional NK cells. These data have been added to the manuscript as Figure 4E, F; Figure supplement 4E-G and Lines 244-260 in the revised manuscript. To clarify this difference, we have replaced the word “hypofunctional” with “hypoactive” throughout the manuscript.

      (3) It seems as if the abstract refers to this phenotype incorrectly since the "hyporesponsive" subset is described as NKG2C-negative. 

      We apologize for the typographic confusion and have corrected our abstract and changed the subset to NKG2D-negative (as was intended).

      (4) "The NK_C1 cluster correlates best with the hypofunction NK phenotype observed in mice as similarly displayed reduced activation (reduced NKG7, NKp80, GZMA, and PRF1) with additional expression of tissue residency markers CD103, CD49a and, surprisingly, the adaptive activating receptor NKG2C (KLRC2) (Figure 5B, C)." 

      There is no doubt that NK_C1 represents tumor-infiltrating NK cells with a CD56bright gene signature with a strong tissue resident score. However, the transcriptional expression of KLRC2 on these is not surprising! It is well established that KLRC2 transcripts (but not protein) are highly expressed on conventional CD56bright NK cells. There are several published sources where the authors can find such data for confirmation. Thus, this is not to be confused with adaptive NK cells having an entirely different transcriptional signature and expressing high levels of NKG2C at the cell surface. I strongly recommend reinterpreting the results based on the fact that KLRC2 is expressed at high levels in conventional CD56bright NK cells. If not, it would be important to verify that these tissueresident NK cells express NKG2C and not NKG2A at the cell surface. 

      We agree with the reviewer and have modified the text accordingly in the revised manuscript (Lines 279-283), including references to tissue-resident adaptive-like cells as described previously in literature. 

      (5) NCAM1 transcript alone is not sufficient to deconvolute CD56bright NK cells in TCGA data (Figure 7A). As a single marker, it likely reflects NK cell infiltration without providing further evidence on the contribution of the bright/dim components. Therefore, the use of the bright Tr NK signature described in Table 1 is very important (Figure 7B). Table 1 is not provided. Nor Supplementary Table 1. There is only one supplementary figure in the ppt attached.

      We agree that a high NCAM1/CD56 single gene signature could also represent NK cell infiltration. We have rephrased this in the text accordingly (Lines 354-357). We apologize for the missing tables and Supplementary figures. We have added these now to the manuscript as Supplementary table 1.

      Reviewer #2 (Public Review)  

      Summary: 

      This work elaborates on a combined therapeutic approach comprising ionizing radiation and CCR5i/αPD1 immunotherapy as a promising strategy in pancreatic cancer. Previous research has established that NK cell-derived CCL5 and XCL1 play a crucial role in recruiting cDC1 cells to the tumor microenvironment, contributing to tumor control. In this study, by using a murine pancreatic cancer model, the authors propose that the addition of radiation therapy to CCR5i and αPD1 immunotherapy could upregulate CD8+ T cells and a subgroup of NK cells within the tumor and result in better tumor control. They further analyzed human single-cell sequencing data from pancreatic cancer patients and identified one subgroup of NK cells (NK C1) with tissue-resident features. Subsequent cell-cell contact analysis reveals the NK-cDC1-CD8 cell axis in pancreatic cancer. By analyzing TCGA data, they found that high NK C1 signature levels were associated with better survival in pancreatic cancer patients. Thus, radiotherapy could benefit the outcome of patients bearing low NK C1 signatures. Importantly, the positive correlation between NK C1 score with survival extends beyond pancreatic cancer, showing potential applicability across various solid cancers.  

      Strengths: 

      This study could add new insight into the clinical practice by introducing such novel combined therapy and shed light on the underlying immune cell dynamics. These findings hold potential for more effective and targeted treatment in the future. Mouse experiments nicely confirmed that such combined therapy could significantly reduce tumor volume. The elegant use of single-cell sequencing analysis and human database examination enriches the narrative and strengthens the study's foundation. Additionally, the notion that NK C1 signature correlates with patient survival in various solid cancers is of high interest and relevance.  

      Weaknesses: 

      The role of CCR5i requires further clarification. While the authors demonstrated its capacity to reduce Treg in murine tumors, its impact on other cell populations, including NK cells and CD8+ T cells, was not observed. Nevertheless, the effect of CCR5i on tumor growth in Figure 2B should be shown. If the combination of radiotherapy and αPD1 already can achieve good outcomes as shown in Figure 3A, the necessity to include CCR5i is questioned. Overall, a more comprehensive elucidation of the roles of CCL5 and CCR5i in this context would be good.  

      We would like to thank the reviewer for their comments and agree that standalone CCR5i also shows a trend of reduced infiltrating NK cells and CD8+ T-cells, although this does not reach significance. We have mentioned this trend in the manuscript (see Lines 162-165) and added n.s to Figure 2B as well. In regards to adding CCR5i; although we observe volumetric control by radiotherapy and anti-PD1, we observe an increase in necrosis induction only in the triple combination compared to radiotherapy combined with anti-PD1 – suggesting that there is an additive effect of CCR5i in our model only as a combination modality. We therefore believe that addition of CCR5i to radiotherapy and anti-PD1 has a beneficial effect. The growth curves for CCR5i alone were already presented in Figure 3A, and we have modified our manuscript to refer to this (see Lines 165-167).

      (1) In line with this, spatial plots in Figure 4 did not include the group with only radiotherapy and αPD1. This inclusion would facilitate a clearer comparison and better highlight the essential role of CCR5i. 

      We agree with the reviewer that inclusion of radiotherapy and αPD1 would facilitate a clear comparison of our data and our experiments did include single controls for radiotherapy and αPD1; however, unfortunately, the tissue slides were of bad quality and therefore not suitable for quantification. In line with this, we have added references to other studies that investigated the effect of immune checkpoint inhibitors in combination with radiotherapy (see Lines 169-172).

      (2) NK C1 cells should be also analyzed in the mouse model. The authors suggest that NKNKG2Dve could be the cell population. Staining of inhibitory markers should be considered, for example, TIGIT and TIM3 as presented in Figure 5B. 

      As per the reviewer suggestion, we have now included some additional data on the surface expression of inhibitory markers/activating receptor on tumor-infiltrating NK cells in our model under the triple combination. These additional data demonstrate increased infiltration of trNK under the triple combination that seem to be more ‘hypoactive’ than conventional NK cells.  This data has been added as Figure 4E in the revised Figure.

      (3) While the cell-cell contact analysis generated from single-cell sequencing data is insightful, extending this analysis to the mouse model under therapy would be highly informative. NK and CD8 cells in the tumor increased upon the combined therapy. However, cDC1 was not characterized. Analysis regarding cDC1 would provide more information on the NK/cDC1/CD8 axis. 

      We agree that looking into cDC1 would be highly interesting in our treatment model and its characterization is currently under investigation. The importance about the interaction between cDC1-NK cells has been described before by various groups, and we have provided additional references for that in our manuscript (see Lines 449-455)

      (4) Human database analysis showed a positive correlation between NK C1 score and CCL5 in pancreatic cancer. Furthermore, radiotherapy could benefit the outcome of patients bearing low NK C1 scores. It would be interesting to test if radiotherapy could also benefit patients with low CCL5 levels in this cohort. 

      We would like to thank the reviewer for their suggestion and please see the figure below for the comparison. Patients with CCL5high are enriched for NK_C1 (Figure 7D) and CCL5high patients with NK_C1high have significantly increased overall and disease-free survival compared to NK_C1low (Figure 7E); where those with NK_C1low significantly benefit from radiotherapy (Figure 7B). Accordingly, patients with CCL5high have significantly decreased overall survival compared to CCL5low patients, again confirming CCL5 as a prognostic marker (Figure 1A, Figure R1). When we look at CCL5low patients however, there is no additional significant benefit for radiotherapy (see insert below) in the CCL5low group (not significant; only significant p-values are shown). These data collectively support the strong correlation between CCL5 levels and NK_C1 enrichment, and imply that radiotherapy alone is insufficient to drive NK_C1 cells in the absence of high CCL5 gradients to improve overall survival. However, given the increased overall survival of CCL5low compared to CCL5high it is likely that other factors are at play. Future studies will be required to further elucidate the role of CCL5 gradients on NK_C1 cells and the beneficial effect of radiotherapy.

      Author response image 1.

      Overall survival of CCL5high versus CCL5low patients stratified into groups with and without radiotherapy using TCGA-PAAD. Log-rank p-value indicates the significance level across all groups while individual significant comparisons are shown as indicated.

      Reviewer #3 (Public Review):

      Summary

      In the submitted manuscript by Go et al, the authors evaluated the tumor microenvironment in pancreatic ductal adenocarcinoma (PDAC) and made a number of interesting observations, including the following: 1) CCL5 expression within the tumor microenvironment negatively correlated with clinical outcomes in human patients with PDAC; 2) there were both positive and negative correlations between CCL5 expression and the expression of specific genes (e.g. those encoding CD56 and CD16, respectively) included among gene signature lists for Treg, MDSC, TAM, and NK cells; 3) CCR5 inhibition with the inhibitor, maraviroc, reduced Treg infiltration but not that of other immune cell types in an orthotopic murine model of PDAC; 4) CCR5 inhibition augmented anti-PD1 immunotherapy when combined with ionizing radiation (IR) therapy in the murine model; 5) the above therapy resulted in increased infiltration of CD8+ cytotoxic T cells as well as of a subset of NKG2D-negative, tissueresidency (tr) marker expressing NK cells (deemed Cluster 1 NK in their data sets) that inversely correlated with the number of E-cadherin+ cells (i.e. tumor cells) and showed predicted interactions with cDC1 dendritic cells (including XCL1/XCL2 expressed by the NK and XCR1 expressed by the cDC1); 6) the authors identified a number of putative signals stemming from the trNK (e.g. IL-16, TNFSF14, FASLG, CSF, MIF) as well as incoming from cDC1s to NK (e.g. BAG6-NKp30); 7) these trNK cells positively correlated with good outcomes and with CD8+ T cell infiltrations in human PDAC as well as in many other solid tumor types; and 8) importantly, the benefit of IR therapy was specific to the subset of PDAC patients (represented in the TCGA dataset) that were predicted to have low amounts of trNK cells. The authors used murine experimental models, multiplexed imaging analyses, and a number of publicly available sequencing data sets from human tumor samples to perform their investigations. Based on their findings, the authors proposed that combining IR with CCR5 inhibition and anti-PD1 immunotherapy is a promising strategy to treat solid cancers.  

      Strengths

      Overall, the collective analyses and conclusions appear to be novel and could be of high and rapid impact on the field, particularly in terms of directing clinical trials to incorporate IR with CCR5 inhibition and immunotherapy. The manuscript is well written; the figures are for the most part clear; and the Discussion is very thoughtful.   

      Weaknesses

      There were a number of minor typographical errors, missing references, or minor issues with the figures. In general, while many of the observations provided strong suggestive evidence of relationships, phenotypes, and functions, the authors often used language to indicate that such things were confirmed, validated, or proven. In fact, there was a paucity of such functional/confirmatory experiments. This does not necessarily detract from the overall significance, excitement for, and potential impact of the study; but the language could likely be adjusted to be more in keeping with the true nature of the findings. The main title and running title are a bit different; consider making them more similar.

      We apologize for the typographical errors, missing references and issues with the figures. We have revised our manuscript, with a major focus on adjusting our language to more carefully reflect our data, and hope to have addressed all the concerns of the reviewer. The slight discrepancy between the main title and running title are to be able to convey the contents of this manuscript in a comprehensive way. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Please make sure all files are made available. Also please check available datasets describing KLRC2 transcripts in CD56brights. This is not to be confused with an adaptive-like signature. 

      We have added the missing table to the supplementary figures and revised the manuscript text in regards to KLRC2 transcript in our NK_C1 cluster and its implications for an adaptive-like signature in the context of tissue-residency (see Lines 279-283; 465-474).

      Reviewer #2 (Recommendations For The Authors): 

      Additional experiments as mentioned in the 'weakness' section could help to further strengthen this study. Besides these points, I would recommend the following: 

      (1) The description in the figure should be more precise and clear. Especially in Figure 3A, it seems the addition of IR into CCR5i or CCR5i/aPD1 leads to a bigger tumor volume.  

      We have adjusted the figure descriptions to more clearly describe the figures. We apologise for the confusion in Figure 3A, this was a figure legend error and has been correctly rectified in the revised Figures (i.e. closed symbols represent +IR conditions).

      (2) The definition of Tregs in figures should be described, e.g. it is not specified which population is shown in Figure S2c.  

      We have added a definition of Tregs (i.e. Live/CD45+CD3+CD4+FOXP3+) in our revised manuscript (see Lines 162-165). To avoid confusion, we have removed the subsequent gating of CCR5 and PD-1 of Tregs in our revised Supplementary Figures.

      (3) Please add a bar in all histology figures, for example, Figure 2A, S2A, S3E. It seems in Figure S3D, E, the green group is missing.  

      We have added the scale bar to all the indicated figures. Unfortunately, indeed as correctly pointed out by the reviewer, we are missing the green group (i.e. IR+CCR5i) as we felt that the excessive growth seen with CCR5i alone may have given a false impression of the extent of infiltration, therefore we did not include this in the original analysis and do not have the data in the Figure.

      (4) Please check through the manuscript, there are some grammar mistakes.  

      We apologise for the grammar mistakes in our original manuscript and have carefully revised the current manuscript to avoid grammar mistakes

      (5) Figure S7B, the left cell lacks a name.  

      We have annotated the left cell accordingly in our revised supplementary figure.

      Reviewer #3 (Recommendations For The Authors): 

      (1) Abbreviations (e.g. PDAC) should be spelled out the first time introduced in the manuscript.

      We have adjusted this in our revised manuscript.

      (2) Referring to the tissue-resident NK cells as "hypofunctional" may not be useful...they seem to be functional, just not in the conventional sense. The authors may want to consider another term, such as non-cytotoxic (given the low expression of cytolytic granules, etc) or immunoregulatory (as they actually refer to them on line 310).

      We agree with the reviewer and have revised the manuscript to refer to them as “immunoregulatory” or “hypoactive” when appropriate. The latter is supported by the additional experiments as shown in Figure 4E.

      (3) Barry et al 2018 Nat Med demonstrated that NK cells in melanoma could support cDC1s and promote positive clinical outcomes in the setting of immunotherapy. It would likely be beneficial to also cite this paper (e.g. on line 425). 

      Thank you for the suggestion, which would work in line with our hypothesis of crosstalk between NK_C1 and cDC1. We have looked for FLT3L in our NK_C1 cluster and did not find any enrichment for FLT3L transcript (see Figure 5E). Nevertheless, we have added the reference in the discussion of our manuscript to further support the importance of crosstalk between cDC1 and NK cells (see Lines 449455)

      (4) Figure 2B: by eye, it looks like the difference between CD8+ T cells in the two conditions would be significantly different; is this not the case? Same thing for the NK cells...what are the pvalues? 

      We have added n.s. to our revised Figure 2B. The p-values for CD8+ T-cells and NK cells were 0.14 and 0.19 {2-tailed students t-test), respectively.

      (5) The murine data strongly suggest that the combination therapy promotes trNK cell infiltration into the tumors, in turn resulting in cDC1-mediated CD8+ T cell infiltration and/or activation. It could be highly valuable/useful to functionally determine (e.g. by depleting NK cells in this model) if NK cells are required for the effects seen. 

      We agree that depletion of NK cells could really solidify the findings even more, and it is part of ongoing investigations for future projects. However, it would be imperative to first characterise these NK cells in more depth as conventional global ablation of NK cells is excepted to highly impact immunosurveillance as well. This is part of current ongoing work.

      (6) Figure 7B: how were "high" and "low" defined (for the NK signature)?

      An enrichment score of the NK_C1 gene signature (see Table supplement 1) was first calculated per patient sample in the TCGA RNA-seq dataset using the Gene Set Variation Analysis (GSVA) method. A cut-off value was then determined using the maximally selected rank statistics (max-stat R package) method to divide patients into “high” and “low”. 

      (7) Lines 164-165 of the Results: it would be good to include a reference supporting the statement.

      We have added rephrased the manuscript and added corresponding references (see Lines 170-173 in revised manuscript).

      (8) There are many conclusions and very speculative language based only on sequencing results, and these have not been validated (e.g. in the Discussion, lines 447-453). As another example, it was concluded that a decrease in NKG2D+ NK cells implied a reduction in overall NK cell cytolytic activity and that NKG2D- NK cells were hypofunctional and did not kill well. This was not tested. Generally, it would be useful for the authors to use language that conveys that the data are primarily suggestive (rather than "confirmatory", line 447) of relationships, phenotypes, and functions at this point. 

      We thank the reviewer for their concerns and have carefully adapted the manuscript text to more clearly clarify the findings in a careful manner.

      (9) On lines 246-247 the authors refer to cluster 3 NK cells, which express CD16, as "immature". The rationale for this designation is not provided, and most human NK cell development models hold that CD16+ NK cells represent the most mature subset(s). 

      We apologize for the typographic error – later on we refer to the NK_C3 cluster as cytotoxic NK cells and we have corrected this in our revised manuscript (see Lines 273-275).

      (10) On line 351, the authors reference supplemental Figure 7C...but I don't see this figure in the accompanying powerpoint file. 

      This should have been Supplementary Figure 7B, and we have corrected it in the revised manuscript (see Lines 374-377)

      (11) On line 417, the authors reference NKp40; this is likely a typographical error. 

      This has been corrected in the revised manuscript to NKp46 (see Lines 439-442).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1 (Public Review):

      He et al. investigate the requirement and function of Blimp1 (encoded by Prdm1) in murine NK cells and ILC1. Employing a conditional knockout mouse model (Prdm1flox x Ncr1cre), the authors describe impaired abundance and maturation of Prdm1-deficient NK cells and ILC1 in different tissues. Blimp1-deficient NK cells have reduced expression of cytotoxic molecules (Gzmb, Prf1) and, in some instances, Ifng production, and Prdm1flox x Ncr1cre mice show impaired tumor control in experimental metastasis models. Using single-cell RNA sequencing analysis, the authors propose that Prdm1 regulates JunB expression and NK cell maturation. Based on in silico analyses, the authors suggest manifold intercellular communication between NK/ILC1 and macrophages. Without following up on any of these potentially interesting suggestions, the authors conclude their study reiterating that Prdm1 regulates IFNg-production of tumor-infiltrating NK cells and ILC1. Many of the reported functions of Blimp1 in NK cells have previously been identified using a mixed-chimera strategy comparing Prdm1 WT and KO NK cells (Kallies et al., Blood 2011). Here, the authors expand on these findings using a conditional model to delete Prdm1 in NK/ILC1 and single-cell sequencing and provide a more refined analysis of the functions of Blimp1 in these cells. Cell-chat analysis suggests close interactions of Blimp-dependent NK/ILC1 subsets with hepatic macrophages, but these suggestions are not followed up by experiments. Potentially interesting differences in the macrophage compartment of Ncr1-Cre x Prdm1-fl/fl mice are suggested by the scRNA-Seq data but are not validated e.g. by FACS. The study falls short in providing new mechanistic insights. Nevertheless, it is an interesting confirmation of "old" suggestions in a more refined setting, and the provided single-cell mRNA-Seq data represents a potentially valuable resource for the community. There are some control analyses that are required to support the conclusions of the authors, and I have a few suggestions that would help to improve the manuscript.

      We sincerely appreciate your careful review and insightful feedback on our manuscript. We have carefully considered your comments and present the results of new experiments conducted in response to your suggestions. Please find the detailed responses below.

      Major comments

      Comment 1: The authors do not control for the potential effects of Cre expression. Expression of Cre from within the Ncr1 locus (using the mouse model established by Narni-Mancinelli et al.) has significant effects on NK cells and especially ILC1s (reducing their frequency and absolute numbers and altering their functionality. The authors should characterize the Ncr1cre mice used here (developed by Shanghai Model Organism Center) in this regard and should use proper controls (Ncr1Cre+ Prdm1wt/wt as control for Ncr1Cre+ Prdm1fl/fl, instead of WT littermates) for all of their key data, e.g. those depicted in Fig 1FG, 2ADFH, 7D, S2,3,4.

      Response 1: This is a very insightful question that has posed a challenge for many researchers, including us, engaged in conditional knockout studies. The expression of Cre and the insertion of loxP sequences both have the potential to influence gene expression. This is because the region where loxP is inserted may contain regulatory sequences for the gene of interest. Ncr1-Cre is a frequently used transgenic mouse model in our laboratory. In our prior research, we also had concerns about the possible impact of Cre on NKp46 expression, which could lead to a decline in NK cell function. Therefore, in our previous studies focused on Smad4 expression in NK cells, we conducted similar experiments. In Figure 6 of our published paper in the Journal of Clinical Investigation (Wang et al., J Clin Invest, 2018), we compared NKp46-iCreTgfbr2fl/flSmad4fl/WT with NKp46-iCreTgfbr2fl/flSmad4fl/fl. Although the primary purpose is to establish Smad4's independence from TGF-β, it also allows for a comparison between Smad4fl/fl and Smad4fl/WT in the presence of Cre. In the critical phenotype we assessed, NKp46-iCreTgfbr2fl/flSmad4fl/fl (compared with NKp46-iCreTgfbr2fl/flSmad4fl/WT) exhibited the same phenotype as NKp46-iCreSmad4fl/fl (compared with NKp46WTSmad4fl/fl). This suggests that Cre's influence on NK cells may be within a reasonable and controllable range. Furthermore, in contrast to the decrease in Ncr1 expression caused by Cre, the reduction in the expression levels of genes targeted by Loxp knockout, such as Prdm1 in this study (Figure 1 E), is more significant. Therefore, with the current techniques and research methods, we believe that the data provided in this study can support the role of Prdm1 in

      NK cells.

      Comment 2: Several of the phenotypic findings on NK cells have been described before by Kallies et al. in 2011 (Ref 29), although using a different genetic Prdm1-ablation model (Prdm1-GFP/GFP knockin/knockout model). This study reported impaired NK cell maturation, reduced Gzmb expression, impaired in vivo cytotoxicity against subcutaneous RMA-S cells, impaired in vitro proliferation, comparable in vitro killing, increase in BM NK cell numbers. The authors should discuss/mention this more prominently in their manuscript, and highlight where they confirm or refine these previous findings, and where they actually provide new information.

      Response 2: We appreciate your valuable suggestions. The article you referred to, published in Blood, is indeed an excellent work. While we had cited this article, our discussion regarding its specific content was limited. Based on your advice, we have made revisions and included the following content in our discussion section (page 24; line 489-493):

      “In a study involving systemic knockout combined with competitive transplantation, it was found that Prdm1 promotes NK cell maturation and the expression of Gzmb. On the contrary, the same study also found that NK cells with Prdm1 deficiency exhibit heightened proliferation, increased survival, enhanced migratory abilities towards tumors, and greater cytotoxicity against subcutaneously implanted RMAS tumors (31).”.

      Comment 3: What is the reason to refer to the enriched cluster in Blimp1-deficient NK cells as "Junbhi"? There is no follow-up for a function of Junb, and there are many other genes upregulated in these cells. Most critically, these cells seem to represent exactly the c-Kithi cells that Kallies et al. already showed and discussed in their paper. The authors should stain for Kit, and also refer to this. Also, MacKay et al. performed Blimp1-Chip-Seq (in T cells), maybe it would be interesting to check to which of the identified DEGs Blimp1 can bind.

      Response 3: We appreciate the suggestion from the reviewer. We think a gene that supports the development of lymphocytes doesn't necessarily positively regulate their function. For example, JunB is essential for T cell development but can also induce T cell exhaustion (Lynn et al., Nature. 2019). Therefore, while Prdm1 has been shown to promote NK cell development, it cannot be assumed that it always positively regulates NK cell function, especially for anti-cancer immune surveillance. In this respect, we try to find a driving-factor of the impaired anti-tumor ability of Prdm1_Δ_Ncr1 NK cells. Although there are many other genes upregulated in this cluster (e.g. Kit), JunB attracts more our interest of its potential for regulating NK cells functions in cancer, whereas c-Kit is more likely a marker of NK cells maturation, which has been well-demonstrated by Kallies et al. and other studies. Our previous studies also showed that the expression of c-kit was decreased in mature NK cells, compared immature NK cells (Wang et al., J Clin Invest, 2018). 

      The lack of following experiments of Junb is because we cannot find valuable surface markers to investigate the follow-up function of _Junb_hi cNK cluster. If we use intracellular markers, it is more likely an analysis of gene expression pattern, which has been well-described in our RNA-seq data. As we describe above, our study did not aim to further investigate the role of prdm1 in NK cells maturation, as the c-Kit expression was upregulated in Prdm1-kncok NK cells and correlated with NK cell maturation, which has been validated by Kallies et al.. 

      We also have discussed the potential DEGs that could be bound and regulated by Prdm1 in our revised manuscript (page 27-28; line 561-571):

      “Prdm1 and Hobit directly bound and repressed Tcf7 (18), which encoded TCF-1, a TF binding and limiting the activity of Gzmb regulatory element (69). Gzmb has been demonstrated directly bound and activated by Junb in NK cells, which suggested Gzmb expression regulated by multiple Prdm1/Hobit downstream signals (26). In human T cells, binding motif of JUNB was enriched in the binding sites of PRDM1 (70), indicating the essential role of PRDM1-JUNB axis during NK cell and T cell development. In NK cells deficient in Prdm1 expression, we noted a decrease in Gzmb levels alongside with an elevation in Junb expression. This indicates that Prdm1 not only facilitates the expression of Gzmb in NK cells but also suppresses Junb expression. Given that Junb is recognized as a positive regulator of Gzmb (71), this presents a complex interplay that seems contradictory. Therefore, it is imperative to develop a theoretical framework to comprehensively understand and interpret this paradoxical relationship.”.

      Comment 4: cNK cells are considered circulating cells, that transiently pass through the liver.

      Previous studies have suggested almost identical gene expression patterns in hepatic and splenic NK cells. In functional tests, they often "perform" identically. I am therefore a bit surprised that the authors find a differential dependency of Blimp1 for the IFNg production of splenic (no role of Blimp1) versus hepatic (Blimp1 regulating IFNg production) NK cells (Fig S3). Do the authors have any suggestions on that? The analyses are performed by 12+4h stimulations with IL12/18, which could involve the effects of altered bystander cells (as suggested by Figure 6). Therefore, these analyses should be provided upon standard 4h stimulations with IL12/18 and also with PMA/I under BFA. Note: liver and splenic cNK cells look quite different in the chosen histograms in Figures 7 A, B, C, yet there is massive variability in these analyses - is there any systematic/technical problem?

      Response 4: We appreciate the valuable suggestion from the reviewer. Studies have suggested that, at the gene expression or transcriptomic level, liver NK cells exhibit more similarity to splenic NK cells while displaying greater divergence from liver ILC1s. However, we do not think that splenic NK cells or peripheral blood NK cells (which are more abundant in circulation) are entirely indistinguishable from liver NK cells. Notably, there are substantial differences in their maturity levels, with liver NK cells being more mature. Since we are examining the protein levels, a 4-hour stimulation period may not fully capture these distinctions. Even when considering the potential impact of bystander cells, the experimental design specifically targets Prdm1 knockout within NK cells, ensuring that the study accurately elucidates the role of Prdm1 in NK cells. For each experiment, we have implemented control measures, and any variances observed in the figures may be attributed to individual variations among the animals. It is also possible that the MFI values measured by flow cytometry exhibit larger variations than a percentage.

      Comment 5: Figure 4 H/I - In contrast to NK cells in Fig 4E, F, the KO and WT ILC1s seem to co-cluster largely. Authors should validate differentially expressed genes. How strong is the effect of Blimp1 in ILC1s? Or is Blimp1 a critical TF driving effector differentiation in NK cells, while it has only subtle effects in ILC1 (these may be regulated by Hobit?)? This seems an interesting finding that should at least be discussed. For these types of small differences in ILC1, FACS confirmation analyses should be performed and findings be reevaluated using Cre-expressing controls (see above).

      Response 5: We appreciate the suggestion from the reviewer. As request, we analyze the DEGs in liver cNK cells and ILC1s from our scRNA-seq data (revised Supplemental Figure 8, A and B). There only a few valuable DEGs in ILC1s compared to cNK cells. It’s likely that Prdm1 have more essential effect of cNK cells transcriptional program, while it plays more important role in keep the homeostasis of ILC1s population. We have discussed these points to better inform the readers. (page 27; line 554-561): 

      “Previous studies have identified Hobit and Prdm1 as central regulators instructing tissue-dependent programs and retention of diverse tissue-resident lymphocytes (18, 51, 53). Liver ILC1s required Hobit, but not necessary for cNK cells (6). Expression of Prdm1 was remarkably higher in cNK cells versus ILC1s (18). While in our study, cNK cells and liver ILC1s reduced simultaneously in Prdm1ΔNcr1 mice, and even more significant in ILC1s. This indicates that while Prdm1 is expressed at lower levels in ILC1s, its role in preserving the quantity of ILC1s may be more crucial. Thus, Prdm1 and Hobit may have parallel program in instructing ILC1s functional development and maturation.”. 

      We cannot find valuable surface marker to evaluate the change in ILC1s, as most of changes are intracellular markers.

      Comment 6: The authors describe and discuss some of Figure 1 and 2 data as if Blimp1 would be involved in alternative NK versus ILC1 fates, but there is no evidence for this.

      Response 6: There is no evidence that Prdm1 could alter the fate decision of the progenitor towards liver cNK or ILC1s. Although some studies reported the conversion between cNK cells and ILC1s in special contexts, it was widely accepted that liver cNK cells and ILC1s originated from different progenitors. While we observed changes in the proportions of liver cNK cells and ILC1 in Prdm1 KO mice, we still lack sufficient evidence to support the relative independence of NK and ILC1 development, as well as evidence to indicate that Prdm1 is exclusively responsible for NK and ILC1.

      Regarding the changes in NK and ILC1 proportions after Prdm1 KO, we believe that both NK and ILC1 cells require Prdm1 to maintain their populations, with ILC1 possibly requiring it to a greater extent. This is the reason for the altered balance between NK and ILC1 cells following Prdm1 KO. We wish to clarify this point to prevent any misconceptions among readers. To address this, we have added the following content to the discussion section (page 25; line 509-516):

      “Furthermore, although both liver NK cells and liver ILC1s require Prdm1 to maintain their quantity, liver ILC1s demonstrate a more pronounced dependency on Prdm1. However, it is currently widely believed that liver NK cells and liver ILC1s originate from different progenitors. It is worth noting that while we observed changes in the NK and ILC1 proportions after Prdm1 knockout, our data does not support the hypothesis that Prdm1 affects progenitor differentiation decisions, thereby influencing the fate selection of NK and ILC1. Further research may be needed to elucidate how Prdm1 regulates the balance between NK cells and ILC1s.”.

      Comment 7: There are several recent studies suggesting a role for Hobit, homologue of Blimp1, in NK cells and in ILC1, and in the control of liver metastases. The authors should discuss similar and unique functions of Hobit and Blimp1, also in the regulation of gene expression patterns, and should refer to these studies.

      Response 7: We would like to express our gratitude to the reviewer for your insightful comments, which bring forth a critical perspective. In accordance with the reviewer's suggestion, we have updated our discussion to include the diverse functions guided by Hobit and Prdm1 in regulating the development and function of cNK cells and ILC1s (page 27; line 554-561):

      “Previous studies have identified Hobit and Prdm1 as central regulators instructing tissue-dependent programs and retention of diverse tissue-resident lymphocytes (18, 51, 53). Liver ILC1s required Hobit, but not necessary for cNK cells (6). Expression of Prdm1 was remarkably higher in cNK cells versus ILC1s (18). While in our study, cNK cells and liver ILC1s reduced simultaneously in Prdm1ΔNcr1 mice, and even more significant in ILC1s. This indicates that while Prdm1 is expressed at lower levels in ILC1s, its role in preserving the quantity of ILC1s may be more crucial. Thus, Prdm1 and Hobit may have parallel program in instructing ILC1s functional development and maturation.”.

      As shown in Supplemental Figure 8, we analyzed two published scRNA-seq data performed with Hobit_KO mice and integrated DEGs in cNK cells and ILC1s with our data. We observed overlaps of DEGs in _Prdm1_Δ_Ncr1 and Hobit_KO between cNK cells and ILC1s, such as _Junb, Tcf7, Gzmb, and Prf1 (Supplemental Figure 8), indicating the similar regulatory network of Prdm1 and Hobit. These data are now described on page 19; lines 386-395:   

      “We also compared the gene expression patterns between Prdm1 and Hobit (homologue of Blimp1) with two published scRNA-seq data (51, 53). Following the knockout of Hobit, the DEGs were primarily identified within ILC1s. Conversely, after the knockout of Prdm1, a greater number of DEGs were observed in cNK cells. This indicates that Prdm1 likely possesses a broader range of target genes within cNK cells, whereas Hobit appears to have a more pronounced impact on gene expression within ILC1s (Supplemental Figure 8, C-F). There are some overlaps between the downstream transcriptional profile of Prdm1 and Hobit in liver cNK cells and ILC1s (Supplemental Figure 8, G and H), such as Junb, Fosb, Tcf7, Kit, Gzmb, Prf1, and Cxcr6 was simultaneously upregulated or downregulated in both Prdm1ΔNcr1 and _Hobit_KO liver cNK cells or ILC1s, indicating the similar regulatory networks of Prdm1 and Hobit.”.

      Comment 8: Figure 4: The authors should discuss (and cross-validate) their liver gene expression analyses in the context of published datasets of NK and ILC1, such as the ones by Lopez et al, Friedrich et al, Ducimetiere et al and Yomogida et al.

      Response 8: We thank the reviewer for raising this important point. To address this question, we have now analyzed the gene expression of liver cNK cells and ILC1 in two published data mentioned above, also in the context of Hobit-knock. We compared gene expression of different clusters and described in our revised manuscript (page 19; lines 386-395). 

      “We also compared the gene expression patterns between Prdm1 and Hobit (homologue of Blimp1) with two published scRNA-seq data (51, 53). Following the knockout of Hobit, the DEGs were primarily identified within ILC1s. Conversely, after the knockout of Prdm1, a greater number of DEGs were observed in cNK cells. This indicates that Prdm1 likely possesses a broader range of target genes within cNK cells, whereas Hobit appears to have a more pronounced impact on gene expression within ILC1s (Supplemental Figure 8, C-F). There are some overlaps between the downstream transcriptional profile of Prdm1 and Hobit in liver cNK cells and ILC1s (Supplemental Figure 8, G and H), such as Junb, Fosb, Tcf7, Kit, Gzmb, Prf1, and Cxcr6 was simultaneously upregulated or downregulated in both Prdm1ΔNcr1 and _Hobit_KO liver cNK cells or ILC1s, indicating the similar regulatory networks of Prdm1 and Hobit.”.

      Recommendations For The Authors:

      Comment 9: The use of a paired t-test analysis when comparing cells/groups from different mice is not correct. Instead, the authors should consider using e.g. an unpaired t-test and re-test the indicated significance (e.g. Figure 1F, Figure 2H).

      Response 9: We thank the reviewer’s comments. As we used littermates for the experiments and they are compared side by side, so the paired t-test analysis is acceptable. We reanalysis the significance in the results of Figure 1F, and Figure 2H using unpaired t-test. The statistics significance of Figure 1F using unpaired t-test was same as using t-test. However, in Figure 2H, the reduced IFN-γ production not reach statistics significance when used un-paired t-test (Supplemental Figure 12B). It may attribute to the variation between different littermates, but the trend is still under the scope of our conclusion. We believe that employing a paired t-test between littermates could be also meaningful. As such, we kept both statistical methodologies to ensure a thorough evaluation.

      Comment 10: In several instances, it is unclear whether data are pooled or representative (and if so, of how many analyses). This information needs to be provided for all analyses. 

      Response 10: We apologize for the lack of details and have now provided the sufficient information in our figure legends. 

      For example, we delete the number in original histogram to avoid the misunderstanding of the unclear whether data are pooled or representative (e.g. original Figure7 A-C; revised Figure7 A-C). Furthermore, we added the “representative” in figure legends of all flow cytometric plots to better inform readers (e.g. original Figure2, D and F; revised Figure2, B and D).

      Comment 11: In the title and abstract authors use "type 1 ILCs" for both NK cells and ILC1, and it is difficult to understand which phenotypes correspond to cNK cells versus ILC1. Most of the analyses clearly separate these two different cell types. I would appreciate a lot being more accurate in the abstract, and describing cNK and ILC1 phenotypes in a clear way.

      Response 11: We are really sorry for our inaccurate descriptions. According to Spits et al., (Spits et al., Nature Reviews Immunology, 2013) and other related studies, we have now adopted a more appropriate nomenclature as “Conventional NK cells” correspond to “cNK cells”, “Type 1 innate lymphoid cells” to “ILC1s”, and “Group 1 ILC” as the collective name of cNK and ILC1s. 

      The definition of these cells was described in the introduction (page 4, line 52-53; line58-62): 

      “Group 1 ILCs consist of cNK cells and ILC1s (1, 2), with distinct developmental trajectories and effect molecules (3).”, “In a state of homeostasis, liver group 1 ILCs (CD45+CD3-NK1.1+NKp46+) can be discriminated into cNK cells and ILC1s by the differential expression of CD49a and CD49b (2): cNK cells are marked by the expression of CD49b, while liver ILC1s exhibit a distinctive positivity for CD49a. Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) is also expressed on liver ILC1s, but not on cNK cells (10, 11).”. 

      We also describe cNK and ILC1 phenotypes in our scRNA-seq data, as shown in page 13; line 259-261: 

      “cNK cells expressed high levels of Itga2 (CD49b) and Eomes, while ILC1s had high levels expression of Itga1 (CD49a) and Tnfsf10 (Supplemental Figure 5, F and G).”.

      Comment 12: In the abstract authors state "The present study unveiled a novel regulatory mechanism of Prdm1 in liver Type 1 ILCs, showing promising potential for developing innovative immune therapy strategies against liver cancer." - maybe authors should discuss how their findings could be used for therapeutic approaches?

      Response 12: We appreciate comments from the reviewer. As there hasn't been a clear consensus on the role of Prdm1 in NK cells prior to this, some studies have suggested that Prdm1 can inhibit cytokine secretion by NK cells. Particularly, Kallies et al. in their 2011 article in Blood found that Prdm1 might suppress NK cell anti-tumor activity. Hence, there hasn't been any immunotherapy targeting Prdm1 in NK cells for cancer treatment. Our research demonstrates the enhancing role of Prdm1 in NK cell anti-tumor activity, providing theoretical support for NK cell therapy targeting Prdm1. 

      We added the following content to the discussion section (page 29; line 605-609): 

      “Further research may provide deeper insight into the role of PRDM1 in the anti-tumor function of human NK cells, enabling a more direct investigation of its application in cancer therapies. Given its important role in preserving liver cNK cells and ILC1s functional heterogeneity, enhancing Prdm1 function in human NK cells could potentially be a strategy to promote NK cell-based immunotherapy for cancer.”.

      Comment 13: The authors should explain or interpret their data a bit more (e.g. what is the consequence of GSEA enriched in negative regulation of Il6 production? (Fig. 3D)  do NK cells produce Il6 (Figure 3)? What's the impact of Il17 signaling in NK/ILC1 (Figure 5). Do the authors suggest JunB-driven metabolic reprogramming (Suppl. Fig 6D-F?).

      Response 13: We appreciate comments from the reviewer. The question of IL-6 production in NK cell also raised by another reviewer. We have checked the GSEA results, and found no valuable genes in IL-6 production in NK cells. According to the suggestions of another reviewer (Response to Reviewer 2 Comment, Comment 14), it may be prudent to omit this figure.

      IL-17 signaling indicated the plasticity of ILC1s, that may be originated from the differentiation of ILC3, we added more discussion of this part (page 17; line 341-344). 

      “Several ILC3 signature genes, such as Rora, Tmem176a, and Tmem176b (45), highly expressed in this cluster (Supplemental Figure 7D). Considering the close relationship between IL-17 mediated immunity response and ILC3 (1, 46), it is plausible that _Il7r_hi ILC1 cluster may be attributed, at least in part, to potential plasticity between ILC1 and ILC3 subsets.”.

      The decreased mitochondrial function may have more relevance to NK cell exhaustion in tumors. Our data suggest that the elevated expression of JunB in NK cells may predispose them to exhaustion. Currently, our hypothesis regarding the promotion of NK cell exhaustion by high JunB expression is based on the observed correlation between JunB expression levels and exhaustion phenotypes (at the gene expression and IFN-γ secretion levels) and the findings in reference 67 (Lynn et al., Nature, 2019), where JunB was found to promote T cell exhaustion. However, we have not demonstrated causation between high JunB expression and exhaustion in NK cells. We propose that in NK cells, especially mature NK cells, excessive JunB expression may make them more sensitive to exhaustion inducers. Nevertheless, further research is needed to confirm this. To clarify this, we added the following content in the discussion section (page 26; line 537-543): 

      “While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junbhi cluster, demonstrates an exhaustion-like phenotype.

      The significant increase in this cell population following Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 14: Ref 25 and Ref 57 are the same publication?

      Response 14: We are really sorry for our careless mistakes. We have checked all the reference and corrected the wrong format.

      Comment 15: Figure 1, E - The method description of RT-PCR is missing. I apologize if I have overlooked this information.

      Response 15: We have now added the description of RT-PCR in our revised method section (page 31; line 638-644):

      “RNA was extracted from FACS-sorted NK cells or splenocytes using RNASimple Total RNA Kit (TIANGEN Biotech, 4992858) and subsequently reverse transcribed to cDNA with SuperScript VILO Master Mix (Thermo Fisher Scientific, 11755050) according to manufacturer’s instructions. qPCR was performed with SYBR Green Mix (Thermo Fisher Scientific, A25742) and CFX Opus 96 Real-Time PCR System (Bio-Rad). The relative mRNA expression level was calculated using 2-ddCt method. Primer sequences:           Prdm1: 5’-CAGAAACACTACTTGGTACA-3’; 5’-GATTGCTTGTGCTGCTAA-3’.”

      Comment 16: Figure 1, F - The NKp46+CD3- gate for the liver seems to cut the population, not all cells are included.

      Response 16: We appreciate the review’s comment and apologize for our carelessness. We expend our data with more samples and reanalyzed them with a more convincing gating strategy. We now update our figures (revised Figure 1G; revised Supplemental Figure 2A). Several changes have occurred in the data and conclusions, and we have accordingly revised these contents in our manuscript.

      The original text is:

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage of cNK cells (CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues except bone marrow and lymph nodes (Figure 1F; Supplemental Figure 2A). However, no significant difference was observed in the percentage of cNK cells among bone marrow-derived lymphocytes between Prdm1ΔNcr1 and Prdm1+/+ mice. The absolute number of cNK cells in blood, lung, liver, and spleen also decreased in Prdm1ΔNcr1 mice (Figure 1F; Supplemental Figure 2A). Only a slight decrease in the number of cNK cells was observed in the lymph nodes of Prdm1ΔNcr1 mice, which did not reach statistical significance either (Supplemental Figure 2A). In contrast, the absolute number of cNK cells in Prdm1fl/fl mice bone marrow is moderately higher than Prdm1ΔNcr1 mice (Figure 1F).”

      The revised text is (page 8; line 142-146):

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage and absolute number of NK cells (CD45+CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues, whereas increased number of NK cells were observed in bone marrow (Figure 1G; Supplemental Figure 2A).”

      Comment 17: Figure 1, The y-axis labeling of lung CD3-NKp46+ cells (x10^3) is not correct.

      Response 17: We are really sorry for our carelessness. We now check the labels and make sure they are correct.

      Comment 18: Figure 1, The statistical significance of absolute numbers of NKp46+ cells in the bone marrow should be reviewed.

      Response 18: We expend our data with more samples and reanalyzed them with a more convincing gating strategy. We observed significant increase of bone marrow NK cells quantity in our updated data. These changes are now described in our revised manuscript.

      The original text is: 

      “However, no significant difference was observed in the percentage of cNK cells among bone marrow-derived lymphocytes between Prdm1ΔNcr1 and Prdm1+/+ mice”, “In contrast, the absolute number of cNK cells in Prdm1fl/fl mice bone marrow is moderately higher than Prdm1ΔNcr1 mice (Figure 1F).”

      The revised text is (page 8; line 142-146):

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage and absolute number of NK cells (CD45+CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues, whereas increased number of NK cells were observed in bone marrow (Figure 1G; Supplemental Figure 2A).”

      Comment 19: Figure 1, G - CD27 and CD11b are used to define maturation stages within NK cells. Here the authors are analyzing group 1 ILC instead (containing both NK cells and ILC1, especially in the liver). It would be better to pre-gate on Eomes+ or CD49b+ NK cells for this analysis.

      Response 19: We apologize for the lack of details in this analysis. We have pre-gate CD49b+ NK cells for the maturation stages analysis. We have now added this statement in our revised manuscript and figure legend (page 8; line 149-151)

      “The maturation of cNK cells (gated by CD45+CD3-NK1.1+NKp46+CD49b+) from blood, bone marrow, lung, liver, spleen, and lymph nodes were assessed, based on the expression of CD11b and CD27.”.

      Comment 20: Supplementary Figure 1, A - The NKp46+CD3- gate seems to cut the population, not all cells are included. y-axis labeling of spleen CD3-NKp46+ cells (%) is not correct.

      Response 20: Thanks, we have corrected these errors and shown in our revised supplementary Figure 2A.

      Comment 21: Figure 2, D-G - Did the authors analyse the ILC1/NK compartment of the tumor? What is the abundance and phenotype of these cells dependent on Prdm1 expression? Proper Crecontrols should be used (see above).

      Response 21: We appreciate the suggestions from the reviewer. As request, we have now added the analysis of cNK/ILC1s population in the context of tumor. The proportion changes of cNK cells and ILC1s in Prdm1_Δ_Ncr1 mice was similar with the no tumor-burden condition, while the number of both cNK cells and ILC1s decreased in tumor bearing liver (revised Figure 7D). These contents have been updated in our revised manuscript (page 23; line 479-481):

      “The proportion changes of cNK cells and ILC1s in Prdm1ΔNcr1 mice was similar with the no tumorburden condition, while the number of both cNK cells and ILC1s have significant decreased in tumor-bearing liver (Figure 7D).”.

      The reason why we did not use Cre-controls was described in comment 1.

      Comment 22: Figure 2, H - Prdm1-deficient NK and ILC1 produce less Ifng in response to in vitro stimulations with Il-12 and /or Il-18, and bulk Seq analysis (Fig 3F) shows reduced Il12rb2 expression. Does the expression of cytokine receptors correlate with the maturation of NK cells? This could be analyzed from the single-cell RNA-seq dataset. The statistical significance of %Ifng after Il12/Il18 stimulation should be revisited (see above).

      Response 22: We thank the reviewer for the suggestions. To address this question, we explored the expression of IL-12 and IL-18 receptors in cNK and ILC1 clusters. Within cNK clusters, Il12rb2, Il18r1 and Il18rap was highly expressed in Prf1hi and Cxcr3hi cNK clusters (revised Supplemental Figure 6H), indicating the IL-18 receptor expression correlated with the NK cell maturation. While in ILC1, these receptors mostly expressed on Il7r_hi and _Gzmb_hi ILC1 clusters (revised Supplemental Figure 7C). Significant decreased of _Il18r1 expression in Prdm1_Δ_Ncr1 cNK cells and ILC1s may associated with the impaired ability to produce IFN-γ. We now added this analysis (page 18; line 364-368):

      “Within cNK cells, Il12rb2, Il18r1 and Il18rap was highly expressed in Prf1hi and Cxcr3hi cNK clusters (Supplemental Figure 6I), indicating the IL-18 receptor expression correlated with the NK cell maturation. While in ILC1, these receptors mostly expressed on Il7r_hi and _Gzmb_hi ILC1 clusters (Supplemental Figure 7D). Significant decreased of _Il18r1 expression in Prdm1ΔNcr1 cNK cells and ILC1s may associated with the impaired ability to produce IFN-γ.”.

      The un-paired t test of IFN-γ production was displayed in revised supplemental Figure 12 B. Difference in IFN-γ production was found to be not significant when analyzed using an unpaired ttest in original Figure 2 H. However, significance was observed in tumor-bearing liver cNK cells and ILC1s, specifically under the context of IL-12/IL-18 stimulation, as depicted in the original Figure 7E using an unpaired t-test. These variations may be attributed to differences among different littermates. Despite these variations, the trend remains consistent with our overall conclusions. We believe that employing a paired t-test between littermates could be also meaningful. As such, we kept both statistical methodologies to ensure a thorough evaluation.

      Comment 23: Figure 3, A-E - For bulk sequencing analysis, splenic CD3-NK1.1+NKp46+ were isolated. This population also contains ILC1 in the spleen (e.g. Flommersfeld et al.), although much less abundant compared to NK cells, and compared to the liver compartment. However, have the authors tested the abundance of splenic ILC1 in Prdm1-deficient mice, which may impact the gene expression data? In line with this the detection of altered Cxcr6 expression in Figure F, which is usually expressed by ILC1 rather than NK cells, may indicate an alteration in ILC1 numbers. The authors should validate the altered expression of CXCR6, Itga1, and Cx3cr1 on NK cells by flow cytometry.

      Response 23: We cited the work of Flommersfeld et al. into our manuscript and have expanded our Results section to include the following information (page 19; line 377-385):

      “Previous research found that spleen NK cells could be divided into three distinct groups based on their expression levels of CD27, CD62L, CD49a, and CD49b (52). CD27+CD62L- NK cells have remarkable high expression of Batf3, while it was only barely expressed in CD27+CD62L+ and CD27-CD62L+ NK cells (52). Based the sequencing data published by Flommersfeld et al., (GSE180978), a notable negative correlation was observed between the expression levels of Prdm1 and Batf3 (Supplemental Figure 8I). On top of that, our findings unveiled the negative regulatory influence of Prdm1 on Batf3 within both spleen and liver NK cells. This discovery highlights a potential upstream mechanism that may influence the hemostasis of the spleen NK cell subpopulations through Batf3.”.

      We validated the expression of CD49a (Itga1) and CX3CR1 in liver cNK cells and ILC1s in our revised manuscript, which is described in our revised manuscript (page 9; line 170-174, page 14; line 231-233):

      “Increased CD49a expression was also observed in Prdm1ΔNcr1 liver ILC1s, while it showed decreased expression in NKp46+ cells in the liver, bone marrow, and lymph nodes (Supplemental Figure 2, F and G).”, “The percentage of CX3CR1+ cNK cells was significantly decreased in multiple tissues of Prdm1_Δ_Ncr1 mice, while the proportion of CX3CR1+ ILC1 was increased in the liver (Figure 3F).”

      Comment 24: Figure 3, F - Tnfsf26: which gene is this? is this a typo? Is a function of this gene in NK cells reported? Altered Batf3 expression suggests an impact on ILC1-like NK cells (Flommersfeld et al).

      Response 24: We are very sorry for our mistakes. We have removed Tnfrsf26 from the heatmap.

      Comment 25: Figure 3, G-J refer to Kallies data?! 

      Response 25: Kallies‘s data has mentioned the reduced GzmB expression in Blimp1gfp/gfp mice. However, compared with Kallies’s study, we further analyzed the GzmB and Perforin expression in different mature stages of NK cells. Reduced GzmB expression not only due to the less mature phenotype in Prdm1-deficient NK cells, highlighting the role of Prdm1 in regulating NK cell function. So, we added these contents in the revised manuscript (page 12; line 233-242):

      “Lower GZMB and PRF1 production was observed in Prdm1-deficient splenic cNK cells, liver cNK cells and ILC1s (Figure 3, H-K; Supplemental Figure 4, A-I). Notably, the proportion of GZMB+ and PRF1+ cNK cells was decreased among almost all of the maturation stages of cNK cells (Figure 3, J and K). The relative mean fluorescent intensities (MFIs) of GZMB and PRF1 consistently show a reduction across all developmental stages in PrdmΔNcr1 NK cells (Supplemental Figure 4, H and I). Yet, no statistical difference of PRF1 was found within the CD11b-CD27+ and CD11b+CD27+ subsets, likely due to the relatively lower perforin levels in these populations (Supplemental Figure 4I). These findings suggest that Prdm1 may directly influence cytotoxic molecule in NK cells, rather than impacting their anti-tumor abilities solely by affecting the maturation phenotype of Prdm1-deficient NK cells.”

      In Discussion section (Kallies’s work is cited here in revised manuscript) (page 24; line 500-502):

      “Our results not only confirmed a decrease in cytotoxic molecules in Prdm1-deficient NK cells (31) but also showed that the reduction in Gzmb and perforin is not solely attributable to the diminished maturation of these cells.”

      Comment 26: Figure 3, G, I - How do the authors explain the high variability of GzmB and Prf1 in Prdm1+/+ cells? 2 samples have comparable values to Prdm1-deficient cells.

      Response 26: This may be due to the inherent differences in MFI among different samples. In the revised version, we have added data on percentages, which exhibit much less variability (Figure 3, H and I). The MFIs of GZMB and PRF1 are moved to supplemental Figure 4 E and F.

      Comment 27: Did the authors test the mice for potential germline recombination of the floxed allele, which has been suggested as a potential problem of Ncr1cre?

      Response 27: We appreciate the insightful comments provided by the reviewer, and this is a really good question. In Prdm1fl/fl mice, germline recombination typically results in a systemic knockout of Prdm1, which can lead to embryonic lethality. Given that mice were successfully born in the current study, it is almost unlikely that germline recombination of Prdm1 occurred due to leaky expression of Cre.

      To confirm this issue, we isolated splenocytes and assessed Prdm1 expression using qPCR. We observed no significant difference in Prdm1 expression between splenocytes from Prdm1+/+ and Prdm1ΔNcr1 mice (revised Figure 1F). This also indicated that germline recombination issues are unlikely to be present in the Prdm1ΔNcr1 mice.

      Comment 28: Histograms do not show MFI

      Response 28: We appreciate the comments provided by the reviewer. The MFI value was omitted.

      Comment 29: Supplementary Figure 4, B - FACS plot labelling: Typo, Histograms do not show MFI.

      Response 29: We sincerely thank the reviewer for careful reading. The typo in this figure was corrected. The MFI is omitted.

      Comment 30: Figure 4, A - What are the cells in the red cluster in the middle of the UMAP, do they belong to B cells? Why do they cluster so separately? It is interesting, but also surprising that NK and ILC1 cluster map so far apart from each other (rather with CD8 or B cells? or NKT cells) - do the authors have any comments?

      Response 30: We sincerely apologize for the mistakes in labeling a group of cells in our previous analysis. Upon a thorough re-evaluation, we have corrected the labels of several cell clusters that were previously misidentified. The revised heatmap (revised Supplemental Figure 5C) represents the marker genes for each cluster. Additionally, in our updated analysis (revised Figure 4A), we have included clusters for Epithelial cells, CD4+ T cells, NKT cells, and Kupffer cells. Please note, the red cluster identified in the center of the original heatmap corresponds to the CD4+ T cells.

      We checked the markers of cNK cell and ILC1 clusters and confirmed they are labeled correctly, as Ncr1 and Klrb1c (NK1.1) was highly expressed in these clusters compared to others (revised Supplemental Figures 5E).

      Comment 31: Does Junb expression correlate with the maturation stages of NK cells?

      Response 31: Our previous research indicated that during the maturation process of NK cells, there was a decrease in the expression levels of Junb (negative correlation), whereas there was an increase in the expression levels of Prdm1 (Wang et al., J Clin Invest, 2018; Supplemental Figure 5c and Supplemental Figure 11).

      Comment 32: The authors may consider validating their scRNA-seq data (e.g. by FACS analysis for highlighted markers, eg. cKit, Tcf7, Gzma, Cxcr3).

      Response 32: We appreciate the suggestion from the reviewer. We validated several marker genes, including Gzmb, Prf1, and Cx3cr1 by FACS, as shown in the revised Figure 3 F-K. Currently, FACS cannot distinguish liver NK cells into as many distinct clusters as can be achieved through scRNAseq analysis. However, we expect that as technology progresses, we will be able to enhance our validation of the scRNA-seq data.

      Comment 33: It is a bit unclear to me why authors refer to Cxcr3hi NK cells as tissue-resident. This is based on Cxcr3 and Ccr2 expression. To make this statement, a much more detailed analysis would be required. How are CD69, CD49a, or CXCR6 expression of these cells?

      Response 34: We appreciate the suggestion from the reviewer. The primary reason for classifying this specific cluster of NK cells as tissue-resident is derived from the differential expression genes (DEGs) and Gene Ontology (GO) analysis, which demonstrate significant chemokine receptor activity within this cluster.

      To make this statement more clearly, we check the expression of the above markers, but only Cd69 had expression in cNK clusters, which was highly expressed in _Junb_hi and _Cxcr3_hi cNK cells (revised Supplemental Figure 6D). We also used top30 DEGs in ILC1s versus cNK to calculate the module score in all cNK clusters, as _Cxcr3_hi cNK had highest score among these clusters (revised Supplemental Figure 6D). This part has been updated in our manuscript (page 15; line 298-308):

      “Expression of tissue-resident markers Cd69 was also highly expressed in this clusters (Supplemental Figure 6D). The enrichment of chemokine receptors in the genes upregulated in the Cxcr3_hi cluster implying a greater likelihood of this cluster being tissue-resident compared with other cNK cell clusters (Figure 4H). To further confirmed tissue-resident properties of this clusters, we calculated the module score based on top30 DEGs in ILC1 versus cNK clusters, including _Cxcr6, Itga1, Cd160, Cd226, etc. _Cxcr3_hi cNK clusters have the highest score among all cNK clusters (Supplemental Figure 6H), indicating the similarity with liver ILC1s. In the tumor microenvironment, reports indicated that NK cells could transform into ILC1s (25). If this conversion of cNK cells into ILC1s also occurred under normal physiological conditions, then _Cxcr3_hi cNK cell cluster might be the most susceptible to such transformation.”

      Comment 35: The authors suggest that Prdm1 regulates chemokine receptor expression. An alternative explanation could be that this is an indirect effect of altering the abundance of NK cell subsets.

      Response 35: We are sorry for lacking the details in these figures. The input cell number of each genotype has now been added in following figure legends. 

      Figure 4F, “Proportions of cNK cells among total cNK cells (left; 211 cells in Prdm1+/+, and 141 cells in Prdm1ΔNcr1) and within clusters (right).”; Figure 5C, “Proportions of ILC1s among total ILC1s in different genotypes (left; 114 cells in Prdm1+/+, and 63 cells in Prdm1ΔNcr1) and within each cluster (right).”; Figure 6C, “Proportions of MDMs and KCs among total macrophages in different genotypes (510 cells in Prdm1+/+, and 624 cells in Prdm1ΔNcr1).”

      To minimize the effects of discrepancies in input numbers between samples with different genotypes, we represented the relative proportions of each cluster within its specific genotype (e.g. Supplemental Figure 6B; Supplemental Figure 7B; Supplemental Figure 9B).

      Comment 36: Supplementary Figures 6 and 7, A - The formatting of gene annotations does not fit the heat maps (the gene names on the last rows are missing).

      Response 36: We apologize for our careless mistakes. We have now addressed these mistakes.

      Comment 37: Supplementary Figures 6 and 7, What is the consequence of compromised mitochondrial function? Increase apoptosis?

      Response 37: In our experiments, we did not find that Prdm1 has an effect on the apoptosis of NK cells. Conversely, previous studies have found that Prdm1 might inhibit the proliferation of NK cells (C. Kucuk, et. al., PNAS, 2011). We acknowledge that there is ongoing debate regarding the precise definition of NK cell exhaustion. In our experiments, no changes were detected in the expression levels of surface markers (TIGIT) associated with exhaustion on NK cells following the knockout of Prdm1. However, we did note a significant reduction in the cytokine secretion capacity and tumor control efficacy of NK cells after Prdm1 knockout. We prefer to say that the consequence of compromised mitochondrial function might be increased exhaustion. As we mentioned in discussion part (line 482-483), mitochondrial fragmentation has been confirmed to be closely associated with NK cell exhaustion in tumor (Zheng et al. Nature immunology, 2019). Although the evidence to define the exhausted NK cells in Prdm1_Δ_Ncr1 was not sufficient, our data may support the compromised mitochondrial functions, at least in part, associated with the exhausted phenotype of Prdm1_Δ_Ncr1 NK cells in cancer. 

      We have discussed these points in our revised manuscript (page 26; line 529-543): 

      “Mitochondria are pivotal organelles crucial for cellular metabolism. Disruptions in mitochondrial function have been linked to T Cell exhaustion, attributed to glycolytic reprogramming (66). Similarly, mitochondrial fragmentation has been closely associated with NK cell exhaustion (67).

      However, the concept of NK cell exhaustion isn't as firmly established as it is for T cells. Exhausted NK cells should primarily exhibit diminished functions. This is characterized by a diminished ability to destroy tumor cells, a reduced capability to activate other components of the immune system, and compromised proliferation and survival rates. Additionally, this reduced functionality is associated with a decline in the expression of molecules responsible for cytotoxic activity, lower production of IFN-γ, and metabolic disturbances that may arise from mitochondrial dysfunction. While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junb_hi cluster, demonstrates an exhaustion-like phenotype. The significant increase in this cell population following _Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 38: Figure 5, Describing the scRNA Seq data, the authors are switching a lot between Figure 4 and Figure 5. Maybe a reorganization of the Figures (Figure 4: NK cell; Figure 5: ILC1) could help.

      Response 38: We appreciate the reviewer’s suggestion. We have now reorganized the Figure 4 and Figure 5.

      Comment 39: Figure 5, We suggest naming one of the ILC1 clusters "Gzmbhi" to keep it consistent with the FACS data.

      Response 39: We agree with this excellent suggestion and have now renaming the “Gzmahi” ILC1 cluster as “Gzmbhi” ILC1 cluster.

      Comment 40: Figure 5, C - How was the JunB score derived (which genes were used)?

      Response 40: The JunB score was calculated based on the expression of marker genes in _Junb_hi cNK clusters (DEGs in _Junb_hi cNK cluster compared to other clusters, as shown in revised Supplemental figure 6A). The score was calculated using “AddModuleScore” R package.

      Comment 41: Figure 5, G, I - The authors highlight Il17 signaling pathway, what is the impact of Il17 on NK/ILC1? Did the authors check for ILC3 (Rorc expression) within the ILC1 cluster?

      Response 41: The enrichment of IL-17 signaling pathway in Il7r_hi ILC1 indicated that this cluster encompass ILC1s originate from the conversion of Rorγt+ ILC3s. Although the Rorc expression was undetectable in all ILC1 clusters, we found several ILC3 marker genes highly expressed in this clusters (e.g. Rora, Tmem176a, Tmem176b) according to the ILC3 transcriptomes (Robinette et al., _Nature Immunology, 2015). 

      We have added these contents in our revised manuscript (page 17; line 341-344): 

      “Several ILC3 signature genes, such as Rora, Tmem176a, and Tmem176b (45), highly expressed in this cluster (Supplemental Figure 7D). Considering the close relationship between IL-17 mediated immunity response and ILC3 (1, 46), it is plausible that _Il7r_hi ILC1 cluster may be attributed, at least in part, to potential plasticity between ILC1 and ILC3 subsets.”.

      Comment 42: Figure 5, The authors detect more Ly49E+ cytotoxic ILC1 in Prdm1fl Ncr1cre mice.

      How does this observation fit to the reduced cytotoxicity of NK cells?

      Response 42: The proportion of _Klra_hi ILC1 was increased, while the _Gzmb_hi ILC1 was decreased in _Prdm1_ΔNcr1 mice. Moreover, total number of three ILC1 cluster was reduced in _Prdm1_ΔNcr1 mice.

      Comment 43: Line 350/351: Citation required.

      Response 43: We added the respective reference. (reference 55 and 56).

      Comment 44: Figure 6, The Cell-chat analysis provides interesting suggestions, but none are experimentally addressed. It is also difficult to evaluate these analyses: are any of the Mac subsets altered in frequency or phenotype in either genotype? This could be analyzed from the single-cell data in Fig 4. At the very least, flow cytometric validation of predicted shifts in the Mac compartment should be confirmed.

      Response 44: We gratefully thanks for these valuable suggestions. As requested, we analyzed macrophages and validated some of the scRNA-seq data by flow cytometry. We have re-written this part with the analysis of altered proportion of two macrophage clusters (Kupffer cells and Monocyte-derived macrophages) (page 20-21; line 399-436):

      “The scRNA sequencing analysis identified two well-established subpopulations of liver macrophages: the resident Kupffer Cells (KCs) and the Monocyte-Derived Macrophages (MDMs) (Figure 6, A-C; Supplemental Figure 9A). When comparing the total proportion of macrophages within the immune cell population of the liver between WT and Prdm1ΔNcr1 mice, there is an increase in Prdm1ΔNcr1 mice (Figure 6C). To confirm these findings, we utilized flow cytometry to define macrophages, including both KCs and MDMs, gating by CD45+Ly6G-F4/80+CD11b+ (Figure 6D).

      Our analysis showed that, following the deletion of Prdm1 in Group 1 ILCs, there is a significant increase in both the proportion and number of macrophages in the liver (Figure 6D).

      According to the transcriptional profile, liver macrophages further clustered and were labeled as “Ly6c2_hi”; “_Cxcl2_hi”; “_Ear2_hi” MDMs, and “_Mrc1_hi”; “_C1q_hi” KCs (Figure 6A, Supplemental Figure 9, A-E). Increased proportion of MDMs and KCs was observed in _Prdm1ΔNcr1 cells (Supplemental Figure 9B). Within MDMs clusters, Ly6c2_hi MDMs mainly compose of _Prdm1+/+ cells, while Prdm1ΔNcr1 cells concentrated in Cxcl2_hi cluster (Figure 6C). The scRNA-seq data reveal that following Prdm1 knockout in NKp46+ cells, there is a decrease in the proportion of KCs within the macrophage population, while the proportion of MDMs increases (Figure 6D). CX3CR1, a chemokine receptor, is extensively utilized to distinguish KCs and MDMs within macrophages. Cells expressing CX3CR1 are identified as MDMs, whereas those without CX3CR1 expression are categorized as KCs (56). Employing flow cytometry and leveraging CX3CR1 expression, we assessed the ratios of KCs and MDMs. However, diverging from the scRNA-seq findings, flow cytometry indicates that post-Prdm1 knockout in group 1 ILCs, there is a minor increase in the proportion of KCs within the total liver macrophages, and a decrease in the proportion of MDMs (Figure 6D; Supplemental Figure 9B). This discrepancy could stem from the different bases of classification: scRNA-seq defines KCs based on gene expression profiles, whereas flow cytometry differentiates between KCs and MDMs using the single surface marker, CX3CR1. Analysis of the macrophage subsets identified by scRNA-seq reveals that, while MDM clusters generally show high CX3CR1 expression, there exists a subset within MDMs, labeled _Mrc1hi, that also exhibits high levels of CX3CR1 (Supplemental Figure 9C). Consequently, if flow cytometry solely employs CX3CR1 for differentiating between KCs and MDMs, it could result in disparities when compared to scRNA-seq outcomes. Both KCs and MDMs has significantly increased in Prdm1ΔNcr1 mice, which was consist with the scRNA-seq data (Supplemental Figure 9, B and F). Despite the decrease in the proportion of Ly6c2hi MDMs in Prdm1ΔNcr1 mice, the expression levels of Ly6c2 exhibited minimal variation between WT and Prdm1ΔNcr1 mice (Supplemental Figure 9D). Intriguingly, within certain cellular subsets, notably the Ear2hi cluster, the Ly6c2 expression levels in KO mice were found to be higher than those in WT mice. Additionally, we employed flow cytometry to examine Ly6C expression within the macrophages. Similar with the scRNA-seq findings, there were no notable differences in Ly6C expression levels between WT and KO mice (Figure 6E; Supplemental Figure 9G).”.

      The changes of the macrophage compartment indicated the potential influence of functional NK cells to macrophages. We have revised these parts in our results and discussion (line 590-601). However, to address more analysis on macrophage is worthy but would go beyond the scope of this manuscript, which will be a direction of our further work.

      Comment 45: Figure 6, C1qhi Mac only are few cells/events, and interactions (or cells?) seem to be gone in the Prdm1-floxed mice. Is that true? Does it make sense to perform cell-chat analysis on so few cells?

      Response 45: We have now added KCs to the cell-chat analysis, and this cluster was belonged to C1qhi KCs. We have revised the analysis of corresponding parts in our manuscript (page 20-21; line 408-428):

      “According to the transcriptional profile, liver macrophages further clustered and were labeled as “Ly6c2_hi”; “_Cxcl2_hi”; “_Ear2_hi” MDMs, and “_Mrc1_hi”; “_C1q_hi” KCs (Figure 6A, Supplemental Figure 9, A-E). Increased proportion of MDMs and KCs was observed in _Prdm1ΔNcr1 cells (Supplemental Figure 9B). Within MDMs clusters, Ly6c2_hi MDMs mainly compose of _Prdm1+/+ cells, while Prdm1ΔNcr1 cells concentrated in Cxcl2_hi cluster (Figure 6C). The scRNA-seq data reveal that following Prdm1 knockout in NKp46+ cells, there is a decrease in the proportion of KCs within the macrophage population, while the proportion of MDMs increases (Figure 6D). CX3CR1, a chemokine receptor, is extensively utilized to distinguish KCs and MDMs within macrophages. Cells expressing CX3CR1 are identified as MDMs, whereas those without CX3CR1 expression are categorized as KCs (56). Employing flow cytometry and leveraging CX3CR1 expression, we assessed the ratios of KCs and MDMs. However, diverging from the scRNA-seq findings, flow cytometry indicates that post-Prdm1 knockout in group 1 ILCs, there is a minor increase in the proportion of KCs within the total liver macrophages, and a decrease in the proportion of MDMs (Figure 6D; Supplemental Figure 9B). This discrepancy could stem from the different bases of classification: scRNA-seq defines KCs based on gene expression profiles, whereas flow cytometry differentiates between KCs and MDMs using the single surface marker, CX3CR1. Analysis of the macrophage subsets identified by scRNA-seq reveals that, while MDM clusters generally show high CX3CR1 expression, there exists a subset within MDMs, labeled _Mrc1hi, that also exhibits high levels of CX3CR1 (Supplemental Figure 9C). Consequently, if flow cytometry solely employs CX3CR1 for differentiating between KCs and MDMs, it could result in disparities when compared to scRNA-seq outcomes.”.

      Comment 46: Figure 6, C - Here the interactions of both Mac+ILC1 and Mac+NK are shown together. It would be interesting to separate this analysis (also Suppl. Fig 9A-B) into comparisons of Mac+ILC1 vs Mac1+NK from WT or Prdm1fl Ncr1 mice.

      Response 46: As request, we re-analyzed this part in each genotype, which was showed in the Supplemental Figure 10. These data have now been described in (page 22; line 445-447).

      “The reduction of interaction mostly occurred in the cross-talk of ILC1-MDM and ILC1-KC, whereas no difference was observed in cNK-MDM and cNK-KC interaction (Supplemental Figure 10, A-H)”

      Comment 47: Supplementary Figure 9, A, B - Is this analysis using WT and Prdm1fl Ncr1cre dataset together? 

      Response 47: Yes, we used WT and Prdm1_Δ_Ncr1 data together. As the request above, we separate this analysis from WT or Prdm1_Δ_Ncr1 Ncr1 mice. These data have now been described in (page 22; line 445-460):

      “The reduction of interaction mostly occurred in the cross-talk of ILC1-MDM and ILC1-KC, whereas no difference was observed in cNK-MDM and cNK-KC interaction (Supplemental Figure 10, A-H). A reduction in the interaction of ligand-receptor, such as Mif-CD74, Cxcl16-Cxcr6, and Cxcl10-Cxcr3 was observed in Prdm1ΔNcr1 mice compared to Prdm1+/+ mice (Supplemental Figure 11). Compared to Prdm1+/+ mice, the information flow of CXCL and MIF pathways significantly decreased in Prdm1ΔNcr1 mice (Figure 6, H and I; Supplemental Figure 10, B, D, F, and H). These pathways play a crucial role in facilitating macrophage migration. The CXCL signaling was sent from Ly6c2_hi _Cxcl2_hi MDMs and _C1q_hi KC, targeting all ILC1 clusters and _Cxcr3_hi cNK cell clusters (Figure 6J). Of note, although the population of _Cxcl2_hi macrophage primarily comprised cells from _Prdm1ΔNcr1 mice, the interaction within the CXCL pathway between macrophages and group 1 ILCs was obviously less than Prdm1+/+ sample (Figure 6J). These changes could be linked to a decreased population of ILC1s and Cxcr3_hi cNK cell cluster in _Prdm1ΔNcr1 mice, implying that the homeostasis of _Cxcl2_hi macrophages required sufficient signals from cNK cells and ILC1s. The impaired CXCLCXCR interactions might subsequently lead to reduced recruitment and activation of group 1 ILCs and macrophages within the tumor microenvironment.”.

      Comment 48: Figure 7, A-C -What is the consequence/interpretation of reduced Mitotracker staining? Any metabolic assays performed? The definition of NK cell "exhaustion" is unclear, is reduced IFNg enough for that? Is the concept of NK cell exhaustion clearly established? Only shortly touched upon in the discussion, the rationale for suggesting an exhausted phenotype, should be explained.

      Response 48: MitoTracker was used to assess the mitochondrial mass. The reduced staining indicated compromised mitochondria function, which associated with mitochondrial fragmentation.

      We believe that the exhaustion of NK cells is not as well-established a concept as it is for T cells. The purpose of detecting mitochondria in this study is to provide evidence for the relationship between Prdm1 and the exhaustion of NK cells. In the discussion section, we have added the following content (page 26; line 529-543):

      “Mitochondria are pivotal organelles crucial for cellular metabolism. Disruptions in mitochondrial function have been linked to T Cell exhaustion, attributed to glycolytic reprogramming (66). Similarly, mitochondrial fragmentation has been closely associated with NK cell exhaustion (67).

      However, the concept of NK cell exhaustion isn't as firmly established as it is for T cells. Exhausted NK cells should primarily exhibit diminished functions. This is characterized by a diminished ability to destroy tumor cells, a reduced capability to activate other components of the immune system, and compromised proliferation and survival rates. Additionally, this reduced functionality is associated with a decline in the expression of molecules responsible for cytotoxic activity, lower production of IFN-γ, and metabolic disturbances that may arise from mitochondrial dysfunction. While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junb_hi cluster, demonstrates an exhaustion-like phenotype. The significant increase in this cell population following _Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 49: Figure 7, x-axis labelling (MFI) of histograms is not correct. Do bar graphs and FACS plots show the same data? Does the number in the FACS plots indicate the MFI? If so, the FACS plots do not show representative samples?

      Response 48: We appreciate the valuable comments provided by the reviewer. In the revised Figure 7, the MFI values have been removed. Bar graphs now display summary data from FACS histograms.

      A representative sample close to the group's mean value was chosen for display in the histograms.

      Comment 50: Figure 7, D - How are these data different from Figure 2H? Why is it now called "exhaustion", but not in 2H? Is the detected IFNg only driven by ex vivo stimulation with Il12/Il18? As above, a "standard" 4h assay should also be provided to allow better interpretation of potential differences. In the introduction, the authors cite the Ducimetiere study (Ref 5) highlighting "the primary function of ILC1 in suppressing the seeding of metastatic tumor cells in liver tissue". Thus, it would be interesting to test Ifng production by liver ILC1 and NK cells ex vivo at early time points of tumor inoculation.

      Response 50: Tumors grow and proliferate within tissues, constituting one of the major causes of lymphocyte exhaustion. This part of the current study aims to investigate whether Prdm1 aids NK cells or ILC1 in resisting the exhaustion induced by malignant tumors. Specifically, we seek to ascertain whether the absence of Prdm1 renders NK cells or ILC1 more susceptible to exhaustion within the tumor microenvironment. Therefore, we will consider the capacity to secrete IFN-γ upon IL-12/IL-18 stimulation as one indicative aspect of exhaustion. It's crucial to emphasize that this assessment serves as only one piece of evidence, not the sole determinant. Overnight stimulation is a conventional method for studying NK cells and has been widely used across different laboratories, including our lab (e.g. Bream et al., Blood, 2003; Yu et al., Immunity, 2006; Wang et al., J Clin Invest, 2018). It's essential to clarify that our approach does not involve stimulating with tumor cells to evaluate the secretion capacity of IFN-γ by NK cells or ILC1.

      Reviewer 2 (Public Review):

      Summary:

      This study offers a significant advancement in understanding liver innate lymphoid cell (ILC) biology by elucidating the role of the transcription factor Prdm1. It shows that Prdm1 is crucial in maintaining the balance between conventional natural killer (cNK) cells and ILC1s in the liver, with knockout models revealing a vital role in cancer defense mechanisms. Despite not affecting direct cytotoxicity, Prdm1 deficiency leads to increased cancer metastasis and reduced secretion of key molecules like IFN-γ, pointing to its importance in immune regulation. The use of single-cell RNA sequencing further underscores Prdm1's role in cellular communication within the liver's immune milieu. This study is a robust contribution to the field, providing insights that could inform new immunotherapy approaches for liver cancer.

      Strengths:

      The study's strength lies in its comprehensive approach, combining the specificity of Prdm1 conditional deletion in Ncr1-cre mice with integrative omics analyses and cutting-edge cytometry to delineate Prdm1's role in liver Type 1 ILC biology and its functional implications in tumor immunity. This multifaceted strategy not only clarifies Prdm1's influence on ILC composition and maturation but also conveys potential therapeutic insights for liver cancer immunotherapy.

      We sincerely appreciate your interest and critical assessment of our manuscript. We have carefully read your comments and suggestions, and I am truly grateful for your expert guidance. We have worked on addressing each of your concerns and comments, and below we provide a point-to-point response. Please find the detailed responses below:

      Weakness

      Comment 1: A notable weakness of the study is the limited scope of in vivo disease models, primarily relying on the B16F10 melanoma model, which may not fully capture the complex behavior of Type 1 ILCs across diverse cancer types. Furthermore, the absence of direct human data, such as the effects of PRDM1 deletion in human NK cells or stem cells during their differentiation into NK and ILC1, leaves a gap in translating these findings to clinical settings.

      Response 1: We appreciate the reviewer for raising these important points, which we see as a unique opportunity for future work to transform our understanding of Prdm1 and its targets as opposed to a weakness of the present study. 

      In our revised manuscript, we have discussed these limitations of our study (page 29; line 602-609):

      “While our findings underscore the importance of Prdm1 in liver cNK cells and ILC1s tumor immune surveillance, it does not be validated in human NK cells, whereas previous studies have found that PRDM1 might inhibit the proliferation and function of human NK cells (33, 73). Furthermore, we not provided an in-depth evaluation in multiple tumor models. Further research may provide deeper insight into the role of PRDM1 in the anti-tumor function of human NK cells, enabling a more direct investigation of its application in cancer therapies. Given its important role in preserving liver cNK cells and ILC1s functional heterogeneity, enhancing Prdm1 function in human NK cells could potentially be a strategy to promote NK cell-based immunotherapy for cancer.”.

      Recommendations For The Authors:

      (Introduction) 

      Comment 2: Reference 1 appears slightly misplaced. You might find the nomenclature discussion in Spits et al., Nature Reviews Immunology, 2013, more appropriate.

      Response 2: We are really sorry for our inaccurate descriptions. According to Spits et al., (Spits et al., Nature Reviews Immunology, 2013) and other related studies, we have now adopted a more appropriate nomenclature as “Conventional NK cells” correspond to “cNK cells”, “Type 1 innate lymphoid cells” to “ILC1s”, and “Group 1 ILC” as the collective name of cNK and ILC1s. 

      The definition of these cells was described in the introduction (page 4, line 52-53; line58-62): 

      “Group 1 ILCs consist of cNK cells and ILC1s (1, 2), with distinct developmental trajectories and effect molecules (3).”, “In a state of homeostasis, liver group 1 ILCs (CD45+CD3-NK1.1+NKp46+) can be discriminated into cNK cells and ILC1s by the differential expression of CD49a and CD49b (2): cNK cells are marked by the expression of CD49b, while liver ILC1s exhibit a distinctive positivity for CD49a. Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) is also expressed on liver ILC1s, but not on cNK cells (10, 11).”. 

      We also describe cNK and ILC1 phenotypes in our scRNA-seq data, as shown in page 13; line 259-261: 

      “cNK cells expressed high levels of Itga2 (CD49b) and Eomes, while ILC1s had high levels expression of Itga1 (CD49a) and Tnfsf10 (Supplemental Figure 5, F and G).”.

      Comment 3: It has come to my attention that Reference 9 has been retracted. I recommend removing this citation to maintain the integrity of your references (https://doi.org/10.1182/blood.2023022801).

      Response 3: We thank the reviewer’s comment and we now have removed this citation.

      Comment 4: For a more comprehensive context around reference 15, consider citing Thierry Walzer's work ([https://rupress.org/jem/article/211/3/563/41636/T-bet-and-Eomes-instruct-thedevelopment-of-two)]) which aligns closely with your discussion.

      Response 4: We agree with the reviewer’s suggestion and have added this citation in our introduction (page 4; line 64-66):

      “Liver environment facilitated T-bet expression in the early stage of NK cells development, which results in Eomes repression. The repression of T-bet is required for Eomes+ NK cells (17).”.

      (Results) 

      Comment 5: The NK cell signature referenced in 32 has been questioned for its reliability as discussed by Cursons et al., CRI 2019 (https://pubmed.ncbi.nlm.nih.gov/31088844/). Reanalysis of data in Figure 1 B/C and Supplementary Figure 1 with the refined NK cell signature from Curson's work would be advantageous.

      Response 5: We thank the reviewer’s comment. As requested, we reanalyzed our data using the refined NK cell signature from Cursons et al. (revised Figure 1 A-C; revised Supplemental Figure 1). Of note, the overall survival of liver cancer (LIHC) patients only reached statistics significance when compared high and low expression of refined PRDM1-NK signature with a median cutoff (Figure 1, A-C). The overall survival performed with quartile high and low expression of refined PRDM1-NK signature was moved to supplemental figure 1, G-I. 

      The original text is: “Examination of 363 liver hepatocellular carcinoma (LIHC) patient samples from The Cancer Genome Atlas (TCGA) revealed a positive correlation between the expression of NK cell-associated genes (NCR1, NCR3, KLRB1, CD160, and PRF1) (32) and PRDM1 expression (Figure 1A). Patients with top and bottom quartiles of NK-PRDM1 signature expression were chosen for survival analysis (Figure 1B). Notably, patients with the NK-PRDM1_hi signature had better overall survival compared to the these with NK-_PRDM1_lo signature (Figure 1C). Similar results were also found in skin cutaneous melanoma (SKCM, n=454) and lung adenocarcinoma (LUAD, n=497) patients (Supplemental Figure 1, A-F). These data suggested that _PRDM1 in NK cells might be essential for immune surveillance in some solid tumors, including liver cancer. These findings prompted us to investigate the impact and mechanism of PRDM1 in NK cells and ILC1 within the context of liver cancer.”

      We have rewritten this part in our revised manuscript (page 7; line 119-132): 

      “Examination of 363 liver hepatocellular carcinoma (LIHC) patient samples from The Cancer Genome Atlas (TCGA) revealed a positive correlation between the expression of NK cell-associated genes (34) (NCR1, KLRB1, CD160, PRF1, etc.) and PRDM1 expression (Figure 1A). The patients are ordered from highest to lowest based on the expression of NK-Prdm1 for survival analysis (Figure 1B). Notably, patients exhibiting higher levels of NK-PRDM1 expression (above the median) experienced better survival outcomes compared to those with lower levels of NK-PRDM1 expression (below the median) (Figure 1C). Similar results were also found in skin cutaneous melanoma (SKCM, n=454) and lung adenocarcinoma (LUAD, n=497) patients (Supplemental Figure 1, A-F). Patients within the highest quartile of NK-PRDM1 signature expression demonstrated enhanced overall survival, a result that achieved statistical significance in LUAD and SKCM patients (Supplemental Figure 1, G-I). These data suggested that PRDM1 in NK cells might be essential for immune surveillance in solid tumors, including liver cancer, and prompted us to investigate the function and mechanism of PRDM1 in NK cells and ILC1 within the context of liver cancer.”.

      Comment 6: The origin of the Ncr1-cre mice utilised should be clarified; is this the line developed by Eric Vivier? (https://www.pnas.org/doi/10.1073/pnas.1112064108).

      Response 6: We did not use the line developed by Eric Vivier, our Ncr1-cre mice was purchase from Shanghai Model Organism Center, Inc.. We described this in our method parts (page 29-30; line 612-614): 

      Prdm1fl/fl mice were purchased from The Jackson Laboratory. Ncr1-iCre and B2m-/- mice were purchased from Shanghai Model Organisms Center, Inc.. Six- to twelve-week-old littermates were used for the experiment.”

      Comment 7: Considering the known reduction of Ncr1 expression in Ncr1-cre mice and its implications, it is recommended to repeat the B16F10 experiments with the correct control, Ncr1cre/+ Prdm1+/+.

      Response 7: This is an excellent question, and it has been raised by another reviewer and comprehensively answered (Reviewer 1, Comment 1). The answer is below: 

      The expression of Cre and the insertion of loxP sequences both have the potential to influence gene expression. This is because the region where loxP is inserted may contain regulatory sequences for the gene of interest. Ncr1-Cre is a frequently used transgenic mouse model in our laboratory. In our prior research, we also had concerns about the possible impact of Cre on NKp46 expression, which could lead to a decline in NK cell function. Therefore, in our previous studies focused on Smad4 expression in NK cells, we conducted similar experiments. In Figure 6 of our published paper in the Journal of Clinical Investigation (Wang et al., J Clin Invest, 2018), we compared NKp46iCreTgfbr2fl/flSmad4fl/WT with NKp46-iCreTgfbr2fl/flSmad4fl/fl. Although the primary purpose is to establish Smad4's independence from TGF-β, it also allows for a comparison between Smad4fl/fl and Smad4fl/WT in the presence of Cre. In the critical phenotype we assessed, NKp46iCreTgfbr2fl/flSmad4fl/fl (compared with NKp46-iCreTgfbr2fl/flSmad4fl/WT) exhibited the same phenotype as NKp46-iCreSmad4fl/fl (compared with NKp46WTSmad4fl/fl). This suggests that Cre's influence on NK cells may be within a reasonable and controllable range. Furthermore, in contrast to the decrease in Ncr1 expression caused by Cre, the reduction in the expression levels of genes targeted by Loxp knockout, such as Prdm1 in this study (Figure 1 E), is more significant. Therefore, with the current techniques and research methods, we believe that the data provided in this study can support the role of Prdm1 in NK cells.

      Comment 8: The proportion of ILC1 in wild-type mouse livers is notably higher than standard references. Could you confirm whether liver perfusion was performed before analysis? This procedure was not clearly detailed in the methods section.

      Response 8: We apologize that we did not provide enough detail regarding this point in our original method. We had performed the liver perfusion before analysis. This has now been clarified in the method section of the revised text (page 30-31; line 630-636): 

      “Mice were perfused with 1◊ PBS by portal vein puncture before harvesting tissues. Liver and lung was digested with 0.05% collagenase II for 30 minutes and filtered through 70 µm cell strainers, and mononuclear cells were isolated after subjected to density gradient using 30% and 70% percoll. Spleen were also removed and pressed through 70 µm filterers to obtain splenocytes. Peripheral blood mononuclear cells were obtained from peripheral blood after lysis of red blood cells (Biolegend, 420301). Flushing femurs and mechanical disruption of inguinal lymph nodes were performed to obtain cells from bone marrow and lymph nodes.”.

      The lymphocyte proportions in mice from different laboratories may exhibit slight variations, possibly due to genetic background disparities. To minimize the influence of genetic backgrounds, paired littermates were used in the current study, wherein one is Prdm1 WT and the other has the Prdm1 gene knocked out in NK cells.

      Comment 9: There appears to be inconsistency in reference formatting; for instance, Ref 39 does not match the formatting of other references. A thorough review of your citation format is suggested.

      Response 9: We apologize for the inadvertent errors and we reviewed the citation format.

      Comment 10: The information in Figures 2B and C may be better suited to the supplementary section as it does not significantly contribute to the main text.

      Response 10: We agree with the reviewer’s suggestion and these are now moved to supplementary figures (Supplemental Figure 2).

      Comment 11: The citation of reference 40 could be strengthened by including Sathe et al., 2014, which directly pertains to your findings (https://www.nature.com/articles/ncomms5539).

      Response 11: We added the suggested reference.

      Comment 12: Can the findings presented in Figure 2D/F be replicated using alternative models?

      This would substantiate the versatility of your results.

      Response 12: The current predominant in vivo tumor model for NK cells is primarily based on the use of B16F10 melanoma cells. These melanoma cells, with their low expression of MHC-I molecules, evade T cell-mediated immune surveillance, rendering them ideal targets for NK cells. Typically, this experimental melanoma metastasis assay involves tail vein injection, followed by nodules' detection in the lungs. To align with our investigation of liver-resident cNK and ILC1, we've introduced splenic injection (via the portal vein) and evaluated melanoma metastasis in the liver to reflect the anti-tumor capabilities of liver group 1 ILCs. We also explored subcutaneous tumor models, but we believe they may not effectively support Prdm1's role in cNK cells, particularly liver-resident NK cells and ILC1. While we've experimented with models using mouse liver tumor cells like Hepa 1-6, we found them less stable than B16F10 and less conducive to quantification. Should more suitable models or cells line emerge, we remain open to exploring them in future research.

      Comment 13: The absence of in vitro killing assessments against B16F10 and YAC-1 leaves a gap in the NK cell characterisation which would be valuable to address.

      Response 13: Isolating NK cells for ex vivo cytotoxicity assays typically requires stimulation with high concentrations of IL-2. Under such high IL-2 stimulation, many intracellular differences that contribute to difference in cytotoxicity, such as changes in transcription factors, are often masked. Another issue is that current ex vivo NK cell cytotoxicity assays often only isolate NK cells from the spleen. Liver-resident NK cells, on the other hand, are often limited in quantity and isolation methods, making it challenging to conduct ex vivo cytotoxicity assays effectively. If more sensitive detection methods become available, we will also incorporate ex vivo data into our future research endeavors.

      Comment 14: The suggestion that NK cells produce IL-6 is indeed a bold one, and without additional validation through intracellular cytokine detection or ELISA, it may be prudent to omit these claims.

      Response 14: We have checked the GSEA results, and found no valuable genes in IL-6 production.

      Therefore, we have removed this figure.

      Comment 15: The lack of fluorescence minus one (FMO) controls in Figure 3 and Supplementary

      Figure 4 is noted; including these would enhance the validity of your gating strategies.

      Response 15: As requested, we add the FMO controls in aforementioned figures.

      Comment 16: There seems to be a minor mix-up in referring to Figure 4A in the scRNAseq results section, perhaps it was intended to refer to Figure 3A?

      Response 16: We have corrected this part (line 247). We also double checked corrected the inaccuracies in the references to the figures. we apologize for the inadvertent errors.

      Comment 17: The rich datasets generated from bulk and scRNAseq are commendable. However, I urge you to make these datasets publicly accessible with a GEO accession number.

      Response 17: We appreciate the suggestion from the reviewer. We plan to upload our datasets when in the last version of our manuscript, which is also the request of the eLife policy.

      Comment 18: Figure 4K is insightful, yet a similar analysis of the ILC1 cluster could provide a more rounded understanding.

      Response 18: We thank the reviewer for the comments. We provide the similar analysis of ILC1s, as showing in revised Figure 5H. 

      Comment 19: The metabolic RNA signatures featured in Supplementary Figure 6 are intriguing and warrant further validation, perhaps through Seahorse analysis. Such validation could merit their inclusion in the main figures.

      Response 19: This is a very good suggestion. Currently, our data offer only limited indications in this context. We have chosen to validate some aspects of Prmd1's influence on cytotoxicity molecules. As for Prdm1's impact on other aspects of NK cells, such as metabolic functions, we may explore further in future research. Additionally, we hope that by publishing our research findings, laboratories worldwide can draw insights for their own studies and conduct relevant research based on this data.

      Comment 20: It is difficult to discern whether the cells depicted in Figure 7D are truly tumorinfiltrating ILC1 or NK cells that have adopted ILC1-like characteristics. Intravenous injection of CD45-PE could clarify this distinction, and if they are the latter, it may be more appropriate to refer to them as ILC1-like cells.

      Response 20: We completely agree with the reviewer's suggestion that "tumor-infiltrating lymphocytes" may not be accurate for the current experiment. Therefore, in the revised manuscript, we have changed it to "liver cNK or ILC1 from tumor-bearing livers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Drougard et al. examined the consequences of an acute high fat diet (HFD) on microglia in mice. 3-day HFD influenced the regulation of systemic glucose homeostasis in a microglia-dependent and independent manner, as determined using microglial depletion with PLX5622. 3-day HFD increased microglial membrane potential and the levels of palmitate and stearate in cerebrospinal fluid in vivo. Using confocal imaging, respirometry and stable isotope-assisted tracing in primary microglial cultures, the authors suggest an increase in mitochondrial fission and metabolic remodeling occurs when exposed to palmitate, which increases the release of glutamate, succinate and itaconate that may alter neuronal metabolism. This acute microglial metabolic response following acute HFD is subsequently linked to improved higher cognitive function (learning and memory) in a microglia and DRP1-dependent manner.

      Strengths:

      Overall, this study is interesting and novel in linking acute high fat diet to changes in microglia and improved learning and memory in mice. The role for microglia and DRP1 in regulating glucose homeostasis and memory in vivo appears to be supported by the data.

      Weaknesses:

      The authors suggest that utilization of palmitate by microglia following HFD is the driver of the acute metabolic changes and that the release of microglial-derived lactate, succinate, glutamate and itaconate are causally linked to improvements in learning and memory. A major weakness is that the authors provide no mechanistic link between beta-oxidation of palmitate (or other fatty acids) in microglia and the observed systemic metabolic and memory phenotypes in vivo. Pharmacological inhibition of CPT1a could be considered or CPT1a-deficient microglia.

      We thank Reviewer #1 for their time, effort and the critique. Indeed, we suggest that palmitate drives the aMMR response and associated improvements in learning and memory. In response to acute HFD we observe 1) increased in palmitate in CSF; 2) impaired mitochondrial ETC activity in primary microglia (within 12 hours of HFD); and 3) improved learning and memory. The greatest barrier to proving how acute palmitate uptake in microglia improves learning and memory in vivo is the protracted methodology required for microglial isolation and purification. The timeframes and relatively harsh digestion protocols required are currently incompatible with metabolomic tracing and well beyond those required for most cell-types used for metabolomic investigation.  We have tested and failed to obtain reproducible data across numerous in vivo protocols and finally settled on in vitro 13C palmitate treated neonatal microglia as the best current option. Primary neonatal microglia are accepted as one of the current best culture models by the microglial community (Valdercaos cell report 2014, Kim Cell Metab 2019). Using neonatal microglia we demonstrate that 13Cpalmitate label is processed to palmitoylcarnitine (Fig 4C) and acetylcarnitine (Fig 4D) indicating that microglial fatty acid metabolism acts via the canonical CPT1/CPT2 pathway. These experiments highlight that microglia process palmitate via beta oxidation generating acetyl coA and engaging the TCA cycle (Fig 4G-I).

      We now acknowledge these technical limitations more clearly and highlight their impact on any conclusions regarding adult microglia in vivo:

      Results “Microglia take up and metabolize free fatty acids”; 

      “Due in part to the long isolation times required to generate pure primary adult microglia, metabolite tracing experiments on primary adult microglia are not currently feasible. We therefore chose primary murine neonatal microglia as our model of choice for more mechanistic experiments (Valdercaos, Cell Report 2014)”

      And,

      Discussion:

      “We propose that aMMR could result from direct uptake, processing, and release of fatty acid derived carbons, and demonstrate that microglia are capable of metabolizing fatty acids towards diverse intracellular and extracellular pools.”

      While acute ICV injection a CPT1a blocker would be of potential interest, the caveats associated with CPT1a inhibition in other cell-types (neurons, astrocytes, etc) and with targeting the appropriate brain region (currently unknown) currently preclude the effective use of this approach for to generate clear additional mechanistic insights. Similarly, given the time and resources required to generate, validate, optimize and experiment on a clean model of in vivo adult microglia-specific CPT1a knockout, this approach was deemed beyond the scope of this study. That said, the critique is important, and it should comprise a follow-up project.

      Comment: Another major weakness is that the authors also suggest that 3-day HFD microglial response (increase membrane potential) is likely driven by palmitate-induced increases in itaconate feedforward inhibition of complex II/SDH. Whilst this is an interesting hypothesis, the in vitro metabolic characterization is not entirely convincing.

      The reviewer is correct, we suggest that our data is consistent with a model where a palmitate-induced increase in itaconate inhibits complex II/SDH. While our findings do not comprise mechanistic proof, the hypothesis is supported by our Seahorse studies (Fig 2E) highlighting that a combined Palmitate + Succinate stimulation does not increase OCR beyond that of Palmitate alone; by primary microglial cell experiments highlighting that 3d-HFD treated adult primary microglia are refractory to succinate-induced mitochondrial membrane depolarization (Fig 2F); and by the identification of increased palmitate induced itaconate production/release in cultured primary neonatal microglia (Fig 4H). The data are consistent with an inhibition of complex II/ SDH and with increased itaconate secretion. They are also consistent with literature on more easily accessible myeloid lineages (Lampropoulou V, Cell Metab 2016).  

      Comment: The authors suggest that acute palmitate appears to rapidly compromise or saturate complex II activity. Succinate is a membrane impermeable dicarboxylate. It can enter cells via MCT transporters at acidic pH. It is not clear that I) Succinate is taken up into microglia, II) If the succinate used was pH neutral sodium succinate or succinic acid, and III) If the observed changes are due to succinate oxidation, changes in pH or activation of the succinate receptor SUCNR1 on microglia. In the absence of these succinate treatments, there are no alterations in mitochondrial respiration or membrane potential following palmitate treatment, which does not support this hypothesis.

      We thank Reviewer #1 for highlighting a lack of information in the material and methods. We have updated them accordingly as follows:

      “For the electron transport chain experiments (ETC), the experiment was based on the Salabei et al. The cell suspension was incubated with the mitochondrial probe Tetramethylrhodamine TMRM (10mM; Abcam, Cat# ab228569) and fluorescent glucose analog 2-NBDG (Abcam, Cat# 235976) for 30min at 37degrees before FACS acquisition. For challenging the ETC, the cell pellet was resuspended in 500ul of warm MAS buffer solution + 1nM Plasma Membrane Permeabilizer (Agilent Seahorse XF PMP) in order to permeabilize the cells. Microglial cells were gated from CD45low-CD11b+ cells followed by singlet after forward and side scatter pattern. They were incubated each 90 seconds by the following drugs: 0,5ul of 100uM Rotenone (Sigma), 2ul of 2.5M Succinate adjusted to ph 7.4 with NaOH (succinic acid, Sigma) and 0.5ul of 1mM Antimycin (Sigma). Cytometry was performed on Fortessa (BD Bioscience) and analyzed with FlowJo v10 (Treestar).”

      Following the updated protocol, we hope we highlighted that the succinate (solution of succinic acid ph 7.4) is reaching directly the ETC since the microglial cells have been permeabilized by the Plasma Membrane Permeabilizer (Agilent Seahorse XF PMP).

      Comment: Intracellular itaconate measurements and quantification are lacking and IRG1 expression is not assessed. There also appears to be more labelled itaconate in neuronal cultures from control (BSA) microglia conditioned media, which is not discussed. What is the total level of itaconate in neurons from these conditioned media experiments? No evidence is provided that the in vivo response is dependent on IRG1, the mitochondrial enzyme responsible for itaconate synthesis, or itaconate. To causally link IRG1/itaconate, IRG1-deficient mice could be used in future work. 

      We appreciate the interest, the exciting question, and the suggested future experiment. Indeed, our results suggest a difference in metabolite release between the BSA treated-microglia and palmitate treated-microglia and their impact on neurons comprises a prime question for future work. We have highlighted this in the discussion as well as adding a comment regarding relative levels of labelled itaconate as follows:

      Results; Acute HFD induces widespread MMR and rapid modulation (…) memory  

      “As a control for the direct uptake of 13C-glucose, we treated parallel neuronal cultures with the same fresh 13C-glucose tracing media originally added to the microglia. Intriguingly, and consistent with literature documenting poor direct glucose utilization by neurons [29], we found substantial m+3 lactate (as well as other metabolites) in neurons treated with microglial conditioned media, and at levels that far exceeded labelling triggered by glucose tracer alone (Fig 5A, middle column vs left column)(Suppl Fig S5B). The data indicate higher uptake of citrate and itaconate from the control microglia-conditioned media, further supporting the hypothesis that neuronal metabolism is reproducibly impacted by palmitate-triggered changes in microglial products. These data demonstrate that palmitate metabolism by microglia modulates neuronal carbon substrate use in vitro, and, they highlight the relative importance of this process compared to uptake of pure glucose. The data identify a candidate mechanism by which aMMR may alter neuronal function in vivo.”

      Comment: While microglial DRP1 is causally implicated the role of palmitate is not convincing. Mitochondrial morphology changes are subtle including TOMM20 and DRP1 staining and co-localization - additional supporting data should be provided. Electron microscopy of mitochondrial structure would provide more detailed insight to morphology changes. Western blot of fission-associated proteins Drp1, phospho-Drp1 (S616), MFF and MiD49/51. Higher magnification and quality confocal imaging of DRP1/TOMM20. Drp1 recruitment to mitochondrial membranes can be assessed using subcellular fractionation.

      We appreciate the reviewer’s comment. Previous work by others, already cited elsewhere in our manuscript

      (PMCID: PMC7251564), has clearly demonstrated increased mitochondrial fragmentation and

      phosphorylated DRP1 in 3d HFD animals. This very specific result can therefore be considered confirmatory / validating of existing literature, and important for inclusion of DRP1 in our overall model. We have made sure to better highlight this important literature accordingly:

      Results; A rapid Microglial Mitochondria response to high fat diet

      “Consistent with the in vivo observations above, in vitro palmitate exposure decreased microglial mitochondrial length within 24 hours, indicating that fatty acid exposure itself is sufficient to trigger mitochondrial fission in a cell autonomous manner (Fig 2G upper panels). This result also confirms observations by Kim et al. who observed mitochondrial fission and DRP1 phosphorylation upon 3d-HFD treated mice [Kim JD et al, Microglial UCP2 mediates Inflammation and Obesity induced by High Fat feeding, Cell Metab 2019].”

      Comment: No characterization of primary microglia from DRP1-knockout mice is performed with palmitate treatment. Authors demonstrate an increase in both stearate and palmitate in CSF following 3day HFD. Only palmitate was tested in the regulation of microglial responses, but it may be more informative to test stearate and palmitate combined.

      Testing stearate and palmitate combined is an interesting experiment for mimicking the global effect of HFD which is highly enriched with these two satured fatty acids, and then, more informative. In vitro stimulation of microglia model cells has been previously published by Valdearcos and al. (Cell Reports 2014) who studied the effect of a mix of stearate and palmitate on the Mediobasal Hypothalamus inflammation. Here, we build on their important findings by demonstrating that these 2 compounds are actually found in the CSF of 3d-HFD mice. Studies from other labs have also shown the presence of stearate and palmitate in the CSF of chronically obese and diabetic patients which highlights the importance of these findings (Melo HM et al. cell report 2020). While a systematic dissection of the roles of HFD-regulated CSF metabolites (including direct (diet containing) and indirect (secondary) is beyond the scope of this study, this point is important, not least because it highlights less well-studied metabolites and the potential of possible combinatorial interactions. We have highlighted this idea in the results as follows:

      Results; A rapid Microglial Mitochondria response to high fat diet

      “To test whether these observed fatty acid changes in the CSF might directly trigger aMMR, we switched to an in vitro primary neonatal microglia model and examined the effects of the more abundant of these, palmitate (Fig S2A-B).”

      and, in the discussion as follows:

      “Studies have identified stearate and palmitate in the CSF of patients with chronic obesity and with diabetes, reports that highlight the importance of these findings (Melo HM et al. cell report 2020). While a systematic dissection of the roles of HFD-regulated CSF metabolites (including direct (diet containing) and indirect (secondary)) is beyond the scope of this study, they represent priority areas for future investigation, particularly given the wide-range of fatty-acid specific biological effects in the literature, and the potential for combinatorial interactions.” 

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on this interesting and novel work. Please see public review for details on potential experiments. While I would not expect all the experiments to be performed for this current study, it’s important to not overstate what the data is showing. For example, there is no causal link between palmitate oxidation in microglia or released metabolites (itaconate etc) from microglia in the effect on systemic glucose metabolism or memory. To make such claims more supporting data would be required.

      We thank Reviewer #1 for their highly constructive critique_._

      Reviewer #2 (Public Review):

      The study "A rapid microglial metabolic response controls metabolism and improves memory" by Drougard et al. provides evidence that short-term HFD has a beneficial effect on spatial and learning memory through microglial metabolic reprogramming. The manuscript is well-written and the statistics were properly performed with all the data. However, there are concerns regarding the interpretation of the data, particularly the gap between the in vivo observations and the in vitro mechanistic studies.

      In the PLX-5622 microglial depletion study, it is unclear what happened to the body weight, food intake, and day-night behavior of these mice compared to the vehicle control mice. It is important to address the innate immunity-dependent physiology affected by a long period of microglial depletion in the brain (also macrophages in the periphery). Furthermore, it would be beneficial to validate the images presented in Fig.1F by providing iba1 staining in chow diet-fed mice with or without PLX-5622 for 7-10 days. Additionally, high-quality images, with equal DAPI staining and comparable anatomical level, should be provided in both chow diet-fed mice and HFD-fed mice with or without PLX-5622 in the same region of hypothalamus or hippocampus. These are critical evidences for this project, and it is suggested that the authors provide more data on the general physiology of these mice, at least regarding body weight and food intake.

      We are grateful to Reviewer #2 for their constructive comments and for their time and effort; and for highlighting the lack of experimental details regarding the PLX-5622 microglial depletion study. We followed the protocol established in Feng et al JCI 2017. No adverse effects on body weight, food intake and day-night behavior have been described in this study as well as in other studies for longer treatment (Sonia George et al Molecular Neurodegeneration 2019). We didn’t observe any differences in body weight and the food intake within or between groups, upon PLX administration. These data have been included as new Supplementary Fig 6 A-B.

      The material and method was updated as follows:

      “Animals were administered PLX5622-containing diet for 7-9 days without observable impact on the body weight or food intake (Fig S6A-B), using protocols adopted from [Feng et al JCi 2017, Sonia George et al Molecular Neurodegeneration 2019].”

      Comment: It is also unclear whether the microglia shown in Fig.3A were isolated from mice 4 weeks after Tamoxifen injection. It is suggested that the authors provide more evidence, such as additional images or primary microglia culture, to demonstrate that the mitochondria had more fusion upon drp1 KO. It is recommended to use mito-tracker green/red to stain live microglia and provide good resolution images.

      We thank Reviewer #2 for pointing out the lack of detailed information about Fig 3A. Microglial cells were indeed isolated from mice after the tamoxifen injection for highlighting the deletion. We updated the Material and methods with the text below;

      “For the colocalization experiment, microglia were isolated from 10 to 12-week old drp1ko mice and their littermate controls, immediately fixed in PFA and stained with DRP1 (diluted 1:50 Cell signaling; Cat#8570) and tomm20 antibodies (diluted 1:1000, SantaCruz; Cat#sc177615).”

      This experiment was performed as an additional control of the drp1 deletion from our knockout-mice. For this experiment we used Tomm20 since the microglia cells weren’t live after the addition of PFA. 

      Comment: Regarding the data presented in Fig.5A, it is suggested that the authors profile the metabolomics of the microglial conditioned media (and provide the methods on how this conditioned media was collected) to determine whether there was already abundant lactate in the media. Any glucose-derived metabolites, e.g. lactate, are probably more preferred by neurons as energy substrates than glucose, especially in embryonic neurons (which are ready to use lactate in newborn brain).

      With regards to Fig 5A, metabolomics of microglia conditioned media are provided as Fig 5A, Supp Figure 5Band we provided a supplementary table 2.

      We thank Reviewer #2 for noting the lapse of technical detail. We updated the Material and methods with the following:

      “For conditioned media experiments, microglial cells were incubated with DMEM (Gibco) without lactate completed with BSA-conjugated palmitate or Control BSA. Conditioned media was collected after the incubation, centrifuged 15min at 300g (4oC) and the supernatant transferred and frozen in a fresh tube avoiding the cells and debris pellet. Sample were immediately snap frozen or use for the neurons incubation.”

      Any glucose-derived metabolites, e.g. lactate, are more preferred by neurons as energy substrates than glucose as described first in the literature by Prof. Pellerin and Prof. Magistretti via the astrocyte-neuron cooperation (PNAS 1994). Since their discovery, lactate has been explored and is well known as a key signaling molecule (Magistretti PJ Nat Rev Neurosciences 2018). We explored the role of lactate released from the microglia, and we demonstrated that it is taken up by neurons independently of any microglial pretreatment. This experiment highlights microglia as another lactate provider for the neurons (Fig 4N and Fig 5A). 

      Comment: Finally, it is important to address whether PLX-5622 affects learning and spatial memory in chow diet-fed animals. Following the findings shown in Fig 5J and 5K, the authors should confirm these by any morphological studies on synapse, e.g. by synaptophysin staining or ultrastructure EM study in the area shown in Fig 5I.

      We appreciate the comment and question. We performed the controls and included them now as Fig 5J and Fig S5 E-F-G. We do not observe any adverse effects of PLX5622 on learning and spatial memory in normal chow-fed animals. 

      While we were unable to study the synapses as requested, it is important to note that no changes are expected given publications from other labs using the same protocol (Feng x JCI 2017 ,Spangenberg E Nat Com 2019), or longer PLX5622 treatment (Niiyama T eNeuro 2023, Witcher KG J neurosciences 2021), all four of which did not find morphological differences at synapses. 

      Reviewer #2 (Recommendations For The Authors):

      The authors should provide more evidence that palmitate is derived from HFD to prove that it mediates the HFD effects on the microglial mitochondria response. This could be done by adding 13C-palmitate into the HFD and performing metabolomics in isolated microglia from control mice (and Drp1-MG-KO mice, if possible).

      We thank the Reviewer #2 for the enthusiastic revision. Unfortunately, we were unable to attempt this final suggested experiment. We have adjusted our wording accordingly and appreciate the reviewer’s understanding.

      Reviewer #3 (Public Review):

      Drougard et al. explore microglial detection of a switch to high-fat diet and a subsequent metabolic response that benefits memory. The findings are both surprising and novel in the context of acute highfat intake, with convincing evidence of increased CSF palmitate after 3 days of HFD. While the authors demonstrate compelling signs of microglial activation in multiple brain regions and unique metabolite release in tracing studies, they should address the following areas prior to acceptance of this manuscript.

      Major Points:

      (1) It appears that the authors perform key metabolic assays in vitro/ex vivo using primary microglia from either neonatal or adult mice, which should be more clearly delineated especially for the 13C-palmitate tracing. In the case of experiments using primary microglia derived from mixed glial cultures stimulated with M-CSF, this system relies on neonatal mice. This is understandable given the greater potential yield from neonatal mice, but the metabolic state and energetic demands of neonatal and adult microglia differ as their functional roles change across the lifespan. The authors should either show that the metabolic pathways they implicate in neonatal microglia are also representative of adult microglia or perform additional experiments using microglia pooled from adult mice, especially because they link metabolites derived from neonatal microglia (presumably not under the effects of acute HFD) to improved performance in behavioral assays that utilize adult mice.

      We thank Reviewer #3 for their constructive critique and encouraging words. As indicated, the 13C-palmitate experiments were performed with primary microglia derived from mixed glial cultures stimulated with M-CSF and we demonstrated our primary cultures were almost pure by the supplementary experiments (supp Fig2A and B). Additional minor details in these contexts have been added to the Material and Methods.

      The experiments focusing on the mitochondrial ETC were performed on sorted microglia from adult mice and parallels demonstrated with the neonatal cultures (the primary model for metabolic tracing). Compromised complex II activity under conditions of acute HFD/palmitate stimulation for instance were shown in both systems. Unfortunately, despite best-efforts, attempts to run 13C-palmitate tracing experiments on primary adult microglia failed, attributable in large part to the long (~4 hour) and harsh microglial extraction and sorting process. These experiments will require substantial follow-up efforts including the establishment and validation ideally of an adult microglia-neuron co-culture model that faithfully recapitulates most aspects of in vivo metabolic cross-talk. This noble aim is beyond the scope of this study. We have made sure to temper the  conclusions made in the manuscript and to not overstate the impact and interpretation of the in vitro work including updating the following sentences.

      Results “Microglia take up and metabolize free fatty acids”; 

      “Due in part to the long isolation times required to generate pure primary adult microglia, metabolite tracing experiments on primary adult microglia are not currently feasible. We therefore chose primary murine neonatal microglia as our model of choice for more mechanistic experiments (Valdercaos cell Report 2014)”

      and Discussion:

      “We propose that aMMR could result from direct uptake, processing, and release of fatty acid derived carbons, and demonstrate that microglia are capable of metabolizing fatty acids towards diverse intracellular and extracellular pools.”

      Comment: The authors demonstrate that 3 days of HFD increases circulating palmitate by CSF metabolomics and that microglia can readily metabolize palmitate, but the causal link between palmitate metabolism specifically by microglia and improved performance in behavioral paradigms remains unclear. A previous body of research, alluded to by the authors, suggests that astrocyte shuttling of lactate to neurons improves long-term and spatial memory. The authors should account for palmitate that also could be derived from astrocyte secretion into CSF, and the relative contribution compared to microglia-derived palmitate. Specifically, although microglia can metabolize the palmitate in circulation, there is no direct evidence that the palmitate from the HFD is directly shuttled to microglia and not, for example, to astrocytes (which also express CX3CR1). 

      We appreciate the comment. Indeed, this issue highlights one of the greatest challenges for efforts aimed at tracing (beyond doubt) that a single minor cell population contributes towards metabolic cross-talk in vivo. Our experiments show: increased CSF palmitate levels within one feeding cycle of HFD; rapidly induced microglial metabolic activation (characterized by increased mitochondrial membrane potential and impaired complex II activity); and that microglia mount a comparable mitochondrial activation profile in vitro when exposed to palmitate. They show in vitro using neonatal microglia that microglia take up and metabolize palmitate; that they release metabolites with neuro-modulatory potential; that neurons take these metabolites up and modulate their function differentially when exposed to control vs palmitate-treated microglia-conditioned media (in the absence of astrocytes). The experiments show through acute PLX-induced elimination of microglia, however crude, that this compartment impacts the acute HFD response, and using conditional deletion, that full DRP1 expression is required CX3CR1-CreERT2 targeted cells (primarily microglia deleting; Zhao et al 2019).  While these experiments cannot rule out a contribution of astrocytes to the observations in vivo, comparable experiments rarely can and we cannot rationalize why microglia should not have equal access to CSF palmitate for uptake or to neurons for substrate provisioning. We now better highlight this important issue, and temper our conclusions accordingly:

      “Tanycytes and astrocytes have both been documented to release select metabolites into the extracellular environment [33, 34]. While suggestive, the experiments highlighted here do not rule out a contribution of these or cell types in coupling acute HFD intake to memory and learning.”

      Comment: Thus, the Barnes Maze results could be attributed to multiple cell types. Furthermore, the evidence provided in Figure 5J is insufficient to claim a microglia-dependent mechanism without showing data from mice on HFD with and without microglia depletion (analogous to the third and fourth bars in panel K).

      Agreed. We appreciate the comment. We have now added the requested HFD condition to Figure 5J. The data support our previous interpretation of the data. 

      Comment: Given the emphasis on improved cognitive function, there is minimal discussion of the actual behavioral outcomes in both the results and discussion sections. The data that HFD-treated animals outperform controls should be presented in more detail both in the figure and in the text. For example, data from all days/trials of the Barnes Maze should be shown, including the day(s) HFD mice outperform controls. Furthermore, the authors should either cite additional literature or provide experimental evidence supporting the notion that microglia release of TCA-associated substrates into the extracellular milieu after HFD specifically benefits neuronal function cellularly or regionally in the brain, which could translate to improved performance in classical behavioral paradigms. The single reference included is a bit obscure, given the study found that increased lactate enhances fear memory which is a neural circuit not studied in the current manuscript. Are there no additional studies on more relevant metabolites (e.g., itaconate, succinate)?

      We agree. We have now re-plotted the behavioral test to better highlight that the HFD-treated animals outperform controls, as requested (Fig S7 and S8). We also added the requested literature. While we cannot be sure our search captured all relevant studies, we find a relative paucity of studies that characterize CSF metabolite changes in the context of acute high fat feeding or that demonstrate the ability of CSF substrates to convincingly improve memory and learning in vivo at physiological levels. Indeed, while simple, we feel the findings are of substantial novelty and highlight an area for significant future research. We have tempered our conclusions throughout and added to the discussion as follows:

      “Such substrate release could mediate the learning and memory effects that accompany aMMR; they are consistent with the data of other studies that have examined metabolite associations with learning and memory (itaconate [Morgunov IG, microorganisms 2020; Xiong J, Neuromolecular med 2023], succinate [Serra FT neurosciences letter 2022; Cline BH, BMC neurosciences 2012].”

      Minor Points:

      (1) In Figure 5J the latency to find the hole was noticeably higher (mean around 150s) than the latency in panel K (mean around 100s for controls, and 60s for Drp1MGWT on HFD). This suggests high variability between experiments using this modified version of the Barnes Maze, despite the authors assertion that a standard Barnes Maze was employed and the results were reproducible at multiple institutions. Why do Drp1MGWT mice on control diet find the escape hole significantly faster than WT mice on control diet in panel J? Given the emphasis on cognitive improvement following acute HFD as a novel finding, the authors should explain this discrepancy.

      We appreciate this question and comment. Indeed, as the reviewer knows, behavioral tests including the Barnes test show variation with genetic background, and with environment and context (eg. age, caging density, litter size, behavioral state and more (Inglis A, Physiol Behavior 2019; Loos M Mamm Genome 2015; and unpublished observations). We do not know the exact origin of the difference mentioned above but our best guess would be that it stems from either environmental differences  that are ever present in vivaria (seasonal, mouse house room, cage-changing cycles, etc) and/or, differences between the background genetics (eg. presence of Cre transgene and linked genome, genetic drift) or precise experimental differences between the cohorts (eg. repeated tamoxifen-injection paradigm for the deletion group). All of our experiments were performed in parallel, with all relevant animal groups equally represented in every run, and,and used age- and sex-matched individuals from congenic strains. Wherever possible, controls and test animals were littermates to minimize within strain variance attributable to litter effects (litter size, maternal and paternal effects). Given our lab’s interest and focus on the mechanistic and developmental origins of variance heterogeneity, these differences are of high interest for future study. 

      Comment: The authors highlight in the graphical abstract and again in Figure 4A the formation of lipid droplets following palmitate exposure as evidence of that microglia can process fatty acids. They later suggest that a lack of substantial induction of lipid droplet accumulation suggests that microglia are metabolically wired to release carbon substrates to neighboring cells. Clarification as to the role of lipid droplet formation/accumulation in explaining the results would eliminate any possible confusion.

      We modified the wording in the manuscript accordingly:

      Results “Microglia take up and metabolize free fatty acids”;

      “Based on BODIPY fluorescence, we found that primary microglia increase lipid droplet numbers within 24h of in vitro exposure to palmitate (200uM; Fig 4A), demonstrating a capacity to take up fatty acids.”

      Comment: In many bar graphs showing relatively modest effects, it would be helpful to use symbols to also show the distribution of sample and animal replicates (especially behavioral paradigms).

      Agreed. Indeed, the results are both modest and impressive given the nature of the intervention (simple change in dietary macronutrient composition). We have now re-plotted the results from the behavioral experiments, accordingly (Fig S7 and Fig S8).

      Reviewer #3 (Recommendations For The Authors):

      This is a good manuscript deserving of publication assuming some of the concerns posed above are addressed.

      We thank Reviewer #3 again for their time, effort, and dedication, and for their objective review of the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      All of the reviewers indicate that their major concerns have been adequately addressed, but they each have a few comments that the authors should consider before submitting a final version (without further review) for publication. For example, a statement about the sex of the mice used in the studies and whether any differences were noted if both sexes were used. The idea that the loss of glutamate transport might affect NA loading into vesicles is also worth considering. Finally, the authors might want to mention that the role of neuropeptide release from NA neurons needs further examination. 

      As noted in the prior submitted revision, all experiments contained both males and females and this was addressed in our re-submission. In our analysis of breathing and metabolism, sex was included in the analysis and no significant phenotypic difference was observed (The statement of no sex difference is in line 451-456). For the fate map and in situ experiments, although the group size is small, we did not see obvious differences in the expression patterns in the three glutamate transporters between females and males (line 347-350). All the anatomical and phenotypic data in this manuscript are presented as combined graphs (figure 1, figure 1 supplement 1, figure 2, figure 2 supplement 2, figure 4,5,6,7) and we had differentially labeled our data points by sex (female data is pink and male data is blue).

      The possibility that loss of Vglut2 might affect NA release has been added in the discussion (line 485-491) of the current revision. Dopamine Beta Hydroxylase (DBH) converts dopamine to noradrenaline in the vesicles, thus, glutamate may not directly affect noradrenaline loading into vesicles. However, since loss of Vglut2 reduced dopamine release in subsets of dopaminergic neurons, it remains possible that glutamate affects dopamine loading in NA neurons and in turn perturbs DA to NA conversion in the vesicle by DBH and subsequent noradrenaline release. Future work could examine this hypothesis using fast-scan cyclic voltammetry (FSCV) or microdialysis.

      The further examination of the role of neuropeptide release from NA neurons is mentioned in the discussion (line 491-494 and line 497-499 of the pre).

      eLife assessment

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments provide compelling evidence that conditional deletion of vesicular glutamate transporters from noradrenergic neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. This study provides an important contribution to our understanding of how noradrenergic neurons regulate respiratory homeostasis in conscious adult mice. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments show that conditional deletion of Vglut2 in NA neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. Their observations challenge the importance of glutamatergic signaling from Vglut2 expressing NA neurons in normal respiratory homeostasis in conscious adult mice. 

      Strengths:

      The comprehensive Vglut1, Vglut2, and Vglut3 co-expression profiles in the central noradrenergic system and the combined measurements of breathing and oxygen consumption are two major strengths of this study. Observations from these experiments provide previously undescribed insights into (1) expression patterns for subtypes of the vesicular glutamate transporter protein in the noradrenergic system and (2) the dispensable nature of Vglut2dependent glutamate signaling from noradrenergic neurons to breathing responses to physiologically relevant gas challenges in adult conscious mice. 

      Weaknesses:

      Although the cellular expression profiles for the vesicular glutamate transporters are provided, the study does not document that glutamatergic-based signaling originating from noradrenergic neurons is evident at the cellular level under normal, hypoxic, and/or hypercapnic conditions. The authors effectively recognize this issue and appropriately discuss their findings in this context. 

      We thank the reviewer for the positive evaluation of our work.

      Reviewer #2 (Public Review):

      The authors characterized the recombinase-based cumulative fate maps for vesicular glutamate transporters (Vglut1, Vglut2 and Vglut3) expression and compared those maps to their realtime expression profiles in central NA neurons by RNA in situ hybridization in adult mice. Authors have revealed a new and intriguing expression pattern for Vglut2, along with an entirely uncharted co-expression domain for Vglut3 within central noradrenergic neurons. Interestingly, and in contrast to previous studies, the authors demonstrated that glutamatergic signaling in central noradrenergic neurons does not exert any influence on breathing and metabolic control either under normoxic/normocapnic conditions or after chemoreflex stimulation. Also, they showed for the first-time the Vglut3-expressing NA population in C2/A2 nuclei. In addition, they were also able to demonstrate Vglut2 expression in anterior NA populations, such as LC neurons, by using more refined techniques, unlike previous studies. 

      A major strength of the study is the use of a set of techniques to investigate the participation of NA-based glutamatergic signaling in breathing and metabolic control. The authors provided a full characterization of the recombinase-based cumulative fate maps for Vglut transporters. They performed real-time mRNA expression of Vglut transporters in central NA neurons of adult mice. Further, they evaluated the effect of knocking down Vglut2 expression in NA neurons using a DBH-Cre; Vglut2cKO mice on breathing and control in unanesthetized mice. Finally, they injected the AAV virus containing Cre-dependent Td tomato into LC of v-Glut2 Cre mice to verify the VGlut2 expression in LC-NA neurons. A very positive aspect of the article is that the authors combined ventilation with metabolic measurements. This integration holds

      particular significance, especially when delving into the exploration of respiratory chemosensitivity. Furthermore, the sample size of the experiments is excellent.  Despite the clear strengths of the paper, some weaknesses exist. It is not clear in the manuscript if the experiments were performed in males and females and if the data were combined. I believe that the study would have benefited from a more comprehensive analysis exploring the sex specific differences. The reason I think this is particularly relevant is the developmental disorders mentioned by the authors, such as SIDS and Rett syndrome, which could potentially arise from disruptions in central noradrenergic (NA) function, exhibit varying degrees of sex predominance. Moreover, some of the noradrenergic cell groups are sexually dimorphic. For instance, female Wistar rats exhibit a larger LC size and more LC-NA neurons than male subjects (Pinos et al., 2001; Garcia-Falgueras et al., 2005). More recently, a detailed transcriptional profiling investigation has unveiled the identities of over 3,000 genes in the LC. This revelation has highlighted significant sexual dimorphisms, with more than 100 genes exhibiting differential expression within LC-NA neurons at the transcript level. Furthermore, this investigation has convincingly showcased that these distinct gene expression patterns have the capacity to elicit disparate behavioral responses between sexes (Mulvey et al., 2018).

      Therefore, the authors should compare the fate maps, Vglut transporters in males and females, at least considering LC-NA neurons. Even in the absence of identified sex differences, this information retains significant importance. 

      An important point well raised by the authors is that although suggestive, these experiments do not definitively rule out that NA-Vglut2 based glutamatergic signaling has a role in breathing control. Subsequent experiments will be necessary to validate this hypothesis. 

      An improvement could be made in terms of measuring body temperature. Opting for implanted sensors over rectal probes would circumvent the need to open the chamber, thereby preventing alterations in gas composition during respiratory measurements. Further, what happens to body temperature phenotype in these animals under different gas exposures? These data should be included in the Tables. 

      Is it plausible that another neurotransmitter within NA neurons might be released in higher amounts in DBH-Cre; Vglut2 cKO mice to compensate for the deficiency in glutamate and prevent changes in ventilation? 

      Continuing along the same line of inquiry is there a possibility that Vglut2 cKO from NA neurons not only eliminates glutamate release but also reduces NA release? A similar mechanism was previously found in VGLUT2 cKO from DA neurons in previous studies (Alsio et al., 2011; Fortin et al., 2012; Hnasko et al., 2010). Additionally, does glutamate play a role in the vesicular loading of NA? Therefore, could the lack of effect on breathing be explained by the lack of noradrenaline and not glutamate? 

      We thank the reviewer for the positive evaluation and further suggestions. Please see our response in “Author Response” to the previous version of Reviewer #2 (Public review).

      Reviewer #4 (Public Review): 

      Summary:

      Although previous research suggested that noradrenergic glutamatergic signaling could influence respiratory control, the work performed by Chang and colleagues reveals that excitatory (specifically Vglut2) neurons is dynamically and widely expressed throughout the central noradrenergic system, but it is not significantly crucial to change baseline breathing as well the hypercapnia and hypoxia ventilatory responses. The central point that will make a significant change in the field is how NA-glutamate transmission may influence breathing control and the dysfunction of NA neurons in respiratory disorders. 

      Strengths:

      There are several strengths such as the comprehensive analysis of Vglut1, Vglut2, and Vglut3 expression in the central noradrenergic system and the combined measurements of breathing parameters in conscious unrestrained mice. 

      Other considerations :

      These results strongly suggest that glutamate may not be necessary for modulating breathing under normal conditions or even when faced with high levels of carbon dioxide (hypercapnia) or low oxygen levels (hypoxia). This finding is unexpected, considering many studies have underscored glutamate's vital role in respiratory regulation, more so than catecholamines. This leads us to question the significance of catecholamines in controlling respiration. Moreover, if glutamate is not essential for this function, we need to explore its role in other physiological processes such as sympathetic nerve activity (SNA), thermoregulation, and sensory physiology. 

      We thank the reviewer for the positive evaluation and further suggestions. The potential role of noradrenergic-derived glutamate in other processes, which is beyond the scope of this study, should be addressed in the future.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      All of my concerns were effectively resolved, leading me to accept the paper. However, I suggest that the authors consider investing in a more reliable system for measuring body temperature, as accurate measurements of this parameter are crucial for whole body plethysmography. 

      Thank you for the suggestion. The real-time measurement of body temperature is a goal in future studies.

      Reviewer #4 (Recommendations For The Authors):

      Because I am revising a revised version, I believe the authors have addressed most, if not all, the concerns raised by already 3 reviewers. In my understanding the authors achieved their aims and the results are totally supported by the conclusions. The impact of this work on the respiratory field is significant and is likely to advance the field. The methods and data utilized, which combine standard techniques with genetic tools, will be highly beneficial to the research community. 

      In my understanding I still have one concern that if glutamate is not critical, then what is? Could we potentially disable the noradrenergic (NA) system while preserving glutamate functionality to determine if the NA system is indeed crucial for respiratory physiology? This approach might provide clearer insights into the mechanisms underlying respiratory control. 

      We agree that there remain several exciting questions about the respective roles of noradrenaline, glutamate, and other neuropeptides such as Neuropeptide Y (NPY) and galanin. We are currently devising strategies to address the respective and combinatorial roles for all these candidates in breathing control. Most simply, we can conditionally, mutagenized each of them in the central noradrenergic system in an acute manner using DBH-CreER mice to determine if any of them are critical to respiratory control with the advantage of minimizing developmental compensatory events.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors evaluated a novel eIF2B activator, DNL343, in two mouse models representing different forms of the integrated stress response (ISR). They first assessed the pharmacokinetics of DNL343, demonstrating its ability to cross the blood-brain barrier and exhibit good bioavailability. In an acute ISR model induced by optic nerve crush (ONC) injury, DNL343 treatment reduced ISR-induced transcriptional changes and neuronal loss, demonstrating neuroprotective effects. Next, the authors generated an eIF2B loss-of-function mice model by knocking in disease-causing Eif2b5 variants. The model presents a chronic ISR and mimics vanishing white matter disease (VWMD). DNL343 treatment from the pre-symptomatic stage improved body weight and motor functions corrected transcriptional changes, and reversed proteomic and metabolomic alterations in the brain and cerebrospinal fluid. DNL343 treatment initiated at an advanced disease stage also showed positive effects, restoring body weight gain, suppressing ISR, reducing neurodegeneration biomarkers, and extending lifespan. These findings highlight DNL343 as an effective ISR inhibitor with potential applications in treating VWMD and other neurodegenerative disorders involving ISR.

      Strengths:

      The study's findings regarding the novel compound DNL343 offer significant promise in addressing VWMD, a condition currently lacking disease-modifying treatment. DNL343 directly targets eIF2B, the disease-causing complex in VWMD, and demonstrates notable efficacy in reversing the integrated stress response (ISR) and mitigating neurodegeneration in a VWMD mouse model. These results raise hope for the potential application of DNL343 in VWMD treatment, a development eagerly anticipated by patients and the VWMD research community. Moreover, the study hints at the broader potential of DNL343 in treating other ISR-related neurodegenerative disorders, such as amyotrophic lateral sclerosis, a prospect that holds broader interest. Additionally, the study's identification of potential biomarkers for VWMD represents a notable strength, potentially leading to improved disease progression assessment pending further confirmation in future research.

      Weaknesses:

      There are a couple of notable concerns in this study. Firstly, while the in vivo evidence strongly supports the efficacy of DNL343 in mitigating ISR and neurodegeneration, there is a lack of direct biochemical evidence to confirm its activity in eIF2B activation. Secondly, the potential for cardiovascular toxicity, which has been reported for a related eIF2B activator in a canine model (as mentioned in the manuscript), has not been evaluated for DNL343 in this study. This data gap regarding toxicity could be crucial for informing the future development of DNL343 for potential human use. Further investigation into these areas would be valuable for a comprehensive understanding of the compound's mechanisms and safety profile.

      We thank the reviewer for the thoughtful feedback and an opportunity to provide further clarification. To address the first question regarding biochemical evidence of the mechanism of action of DNL343, we agree that additional data is helpful to interpreting the results presented in this manuscript. We now include a citation to Craig et al (Craig, R.A., 2nd, J. De Vicente, A.A. Estrada, J.A. Feng, K.W. Lexa, M.J. Canet, W.E. Dowdle, R.I. Erickson, B.N. Flores, P.C.G. Haddick, L.A. Kane, J.W. Lewcock, N.J. Moerke, S.B. Poda, Z. Sweeney, R.H. Takahashi, V. Tong, J. Wang, E. Yulyaningsih, H. Solanoy, K. Scearce-Levie, P.E. Sanchez, L. Tang, M. Xu, R. Zhang and M. Osipov (2024). "Discovery of DNL343: A Potent, Selective, and Brain-Penetrant eIF2B Activator Designed for the Treatment of Neurodegenerative Diseases." J Med Chem.) which includes the full details on the discovery and characterization of DNL343.

      On the question of cardiovascular toxicity observed with previous eIF2B activating compounds, Craig et al also provides evidence in a non-human primate (cynomolgus monkey) model that DNL343 dosing did not result in QT prolongation or any functional cardiac changes. We have also completed a Phase 1 (NCT04268784) and Phase 1B double-blind (NCT05006352) trials in healthy and ALS participants, respectively and these trials are referenced on page 4, lines 102-103. The safety profile observed in these clinical studies supported further development of DNL343 for ALS in the Healey Platform trial (NCT04297683, Regimen G).

      Reviewer #2 (Public Review):

      Summary:

      The authors developed DNL343, a CNS-penetrant small molecule integrated stress response (ISR) inhibitor, to treat neurodegenerative diseases caused by ISR.

      Strengths:

      DNL343 is an investigational CNS-penetrant small molecule integrated stress response (ISR) inhibitor designed to activate the eukaryotic initiation factor 2B (eIF2B) and suppress aberrant ISR activation. The therapeutic efficacy of DNL343 has been extensively characterized in two animal models. Importantly, plasma biomarkers of neuroinflammation and neurodegeneration can be reversed with DNL343 treatment. Remarkably, several of these biomarkers show differential levels in CSF and plasma from patients with vanishing white matter disease (VWMD) upon DNL343 treatment. Overall, this is a very exciting study to target ISR for therapeutic interventions.

      Weaknesses:

      My main questions center around the characterization of DNL343.

      (1) Is there any biochemical evidence showing DNL343 activates eIF2B, such as binding assays or in vitro biochemical activity assays? A conference presentation was cited - "Osipov, M. (2022). Discovery of DNL343: a Potent Selective and Brain-penetrant eIF2B Activator Designed for the Treatment of Neurodegenerative Diseases. Medicinal Chemistry Gordon Research Conference. New London, NH." However, there needs to be public information about this presentation.

      Information from this presentation and more details on the discovery and characterization of DNL343 can be found in Craig et al J Med Chem (2024) and this citation has been replaced.

      (2) How was the selectivity of DNL343 demonstrated? What are the off-targets of DNL343, in particular when DNL343 is administered at a high dose? Thermal-proteasome profiling or photoaffinity labeling experiments could be considered.

      Please see Craig et al J Med Chem (2024) for full details. In brief, there were no significant off target effects observed for DNL343 in a Cerep panel.

      (3) What are the total drug concentrations in the brain and plasma? What are the unbound ratios?

      Following a single oral dose of DNL343 in mice, unbound brain-to-unbound plasma exposures ratios (Kp,uu) of 0.8 to 1.1 were observed, indicating high CNS penetrance. This was further supported by CSF-to-unbound plasma exposures ratios at 0.9 in the same mouse study. The CNS penetrance was also confirmed in rats and NHP by CSF-to-unbound plasma ratios near unity as reported in Craig et al J Med Chem (2024).

      (4) If DNL343 is given intravenously, what are the concentrations in the brain and plasma after 5 minutes and 1 hour or longer time points? In other words, does DNL343 cross BBB through passive diffusion or an active process?

      Unbound brain-to-unbound plasma exposure ratios following a single oral dose in the mouse were 0.8 to 1.1 and showed no time dependence. These measurements were made prior to, near, and following plasma tmax of DNL343, indicating unbound DNL343 crosses the BBB through passive diffusion and rapidly reached equilibrium between the brain and systemic circulation. Details can be found in Craig et al J Med Chem (2024).

      (5) What is the complete PK profile of DNL343 for intravenous and oral dosing?

      DNL343 administered orally to mice as a suspension formulation showed plasma PK consistent with prolonged absorption with tmax ranging from 3 to 4 h, and a terminal elimination half-life (t1/2) of ~10 h. Details can be found in Craig et al J Med Chem (2024).

      (6) Are there any major drug metabolites that could be of concern?

      DNL343 metabolism is through Phase 1 biotransformation pathways. None of the in vivo circulating metabolites show potency towards eIF2B activation. Given that none of these metabolites are of concern, we believe this information is beyond the scope of the current manuscript.

      Reviewer #3 (Public Review):

      Summary:

      ISR contributes to the pathogenesis of multiple neurodegenerative diseases, such as ALS, FTD, VWMD, etc. Targeting ISR is a promising avenue for potential therapeutics. However, previously identified ways to target ISR present some challenges. PERK inhibitors suppress ISR by inhibiting eIF2alpha phosphorylation and cause pancreatic toxicity in mice. In order to bypass eIF2alpha, previous studies have identified ISR suppressors that target eIF2B, such as ISRIB and 2BAct. These molecules suppress neurodegeneration but do not cause detrimental effects in mouse models. However, ISRIB is water-insoluble, and 2BAct causes cardiovascular complications in dogs, preventing their use in clinics. Here, the authors showed that DNL343, a new ISR inhibitor targeting eIF2B, suppresses neurodegeneration in mouse models. Combined with their previous results of a clinical phase I trial showing the safety of DNL343, these findings suggest the promise of DNL343 as a potential drug for neurodegenerative diseases in which ISR contributes to pathogenesis.

      Strengths:

      The finding is important and has disease implications, and the conclusion is not surprising.

      Weaknesses:

      The experimental design and data are hard to comprehend for an audience with a basic research background. This reviewer suggests that the authors use the same way that previous studies on ISRIB and 2BAct (e.g., Wong et al; eLife, 2019) designed experiments and interpret data.

      We thank this reviewer for their feedback and recognition that DNL343 has a promising potential as treatment for neurodegenerative diseases. While our studies share some similarities to Wong et al., eLife (2019) and Abbink et al., ACTN (2019), our study design is intentionally distinct (e.g. inclusion of both prevention and treatment dosing paradigms, determining dose-response impact of drug treatment across biomarkers) which necessitates tailored data visualization to effectively communicate our findings. However, we understand the importance of clarity for a broader audience and to this end, we have made a number of changes to the data figures, in particular data from omics experiments in Figures 3 and 5. We also provided additional supplemental tables to aid data interpretation. This would hopefully cater to both audiences familiar with previous work and those with a less specialized background.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Demyelination is a significant pathological feature in the VWMD mouse model. The authors should clarify whether they observed similar demyelination in their study and if DNL343 had any impact on reversing this demyelination. These findings are crucial for assessing the compound's effectiveness in mitigating neurodegeneration.

      Demyelination is indeed an important feature in the eIF2B LOF (VWMD) mouse model. Given that this phenotype and the ability to rescue the histological phenotype with this MOA (Wong et al; eLife, 2019, cited in introduction) is very well characterized, along with our limitation from the size and number of mouse tissues, we prioritized non-histological targeted and unbiased analyses that were aimed at identifying translatable biomarkers. Nonetheless, the totality of our data, in different mouse models and cell types, strongly supports DNL343 as a potent ISR inhibitor that is effective in attenuating neurodegeneration:

      · In the optic nerve crush model, DNL343 dose-dependently reduced retinal cell degeneration

      · In the VWMD mouse model, DNL343 attenuated the increase in a plasma biomarker of neurodegeneration, neurofilament-light, which corresponded to normalization in motor function.

      · Metabolomic and lipidomic analyses in the VWMD mouse model brain showed increases in oxysterols, such as 7-ketocholesterol, and cholesterol esters and these lipids are associated with demyelination (Nugent et al, 2020). DNL343 treatment attenuated the levels of these oxysterols, indicating decreased demyelination.

      · When initiated at an advance disease stage, reversal of plasma biomarkers of neurodegeneration (Nf-L) and neuroinflammation (GFAP) by DNL343 in this model was accompanied by extension in the lifespan that is otherwise shortened as the mutant animals succumb to disease.

      These data highlight the potential therapeutic benefits of DNL343 in the broader context of ISR-mediated neurodegeneration which can include but may not be limited to VWMD.

      (2) Figure 6 presents several biomarkers with significantly increased levels in VWMD mice and patient biofluids. However, these biomarkers are not reflected in the brain proteomics data presented in Figure 3. The discrepancy between these findings should be addressed and discussed in the manuscript to provide a more comprehensive understanding.

      Proteins detected in Figure 6 were not detected by TMT proteomics in the CSF. In the brain, only GFAP was detected and the overall abundance in tissue were similar in both genetic groups. Cytokines such as TIMP1, MCP1 are usually present in low abundances and therefore are challenging to detect in broad discovery proteomics method applied in this study. Antibody-based immunoassays are better suited to specifically measure low abundant proteins than mass-spectrometry-based proteomics, while mass-spectrometry based methods offer wider dynamic range to detect more highly abundant proteins. Differences in detection sensitivity between immunoassay vs mass spectrometry assays has been previously noted (Petrera et al, J Proteome Res, 2021). We have added new text to address this point in the revised manuscript (page 7, line 274-277).

      (3) Figure 7 discusses the effects of DNL343 treatment initiated at an advanced disease stage. Since the 4-week treatment did not rescue performance in the balance beam test (as shown in Figure 6A), it is important to clarify if a 20-week treatment had any impact on this parameter.

      This reviewer raised an important question that we were unfortunately unable test. When the balance beam training was administered after 8 (out of 20) weeks of dosing, most animals of both wildtype and mutant genotypes struggled to remain on or maintain balance on the beam and were unable to progress traversing the beam, making the assay unsuccessful in this cohort. This impairment appeared to be driven by distinct factors in the two genotypes: age-associated obesity in wild-type animals and severe motor impairment in the eIF2B HOM mice, irrespective of treatment. While it is possible that other less demanding and more sensitive assays could reveal more nuanced differences, this, and our earlier data (Figure 4G-I), suggest that DNL343 could prevent but not reverse functional deterioration. This is in line with our understanding of DNL343 mechanism of action that does not include neuronal regeneration, a therapeutic effect that is likely required for functional recuperation. We have added this point to the manuscript (page 8, line 319-326).

      Additionally, considering the significant increase in Gdf15 levels in the disease model, it would be valuable to know if DNL343 treatment affected Gdf15 levels. If these assays were conducted, reporting the data would greatly assist in evaluating the compound's efficacy when administered at an advanced disease stage.

      We were not able to measure GDF15 levels in the 20-week study due to limitation in the in-life collected plasma samples which was dedicated to assessing biomarkers of neurodegeneration (Figure 7E-F). However, data from our 4-week treatment study, which was initiated at a similar age range to the 20-week treatment study (19-26 and 24-33 weeks of age, respectively), showed that DNL343 was able to reduce GDF15 levels in the brain (mRNA and protein) and CSF (protein) (Supplemental Figure 5A-C), suggesting that DNL343 reduces ISR activation at an advanced disease stage in the model. We expect that this reduction observed at 4 weeks of treatment would persist for the duration of the extended treatment in the 20-week cohort.

      (4) A minor point. In Figures 5A, 5C, and 5E, it appears that the red-colored group should likely be labeled as "HOM 0 mg/kg" instead of "HOM 3 mg/kg".

      This has been amended, thank you.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      (1) The cellular function of DNL343 needs to be clarified. The authors claim that it activates eIF2B, but no cellular or molecular evidence is provided. Does it bind to eIF2B? Does it not affect eIF2alpha phosphorylation? Does it restore translation upon stress that causes eIF2alpha phosphorylation? Does it suppress stress granule assembly? The authors cited Sun, Tsai et al. 2023 and Osipov et al., 2022. However, these citations are conference abstracts with no published figures available for review.

      We agree that additional data outlining the biochemical evidence of the mechanism of action of DNL343 was needed. We now include a citation to Craig et al J Med Chem (2024) that includes the full details on the discovery and molecular characterization of DNL343.

      (2) It needs to be clarified how the authors selected the ISR marker genes. ISR genes are more than those selected. How about others? How did the authors measure the mRNA levels, bulk RNA-seq or RT-PCR? If the former, have the authors verified their results using RT-PCR? Have the authors measured the protein levels for nerve crush experiments (by both proteomic and individual protein analyses)? Also, no statistical analyses were found for the heat maps.

      The ISR marker genes were selected by a combination of experimental and literature data. Transcriptomics analysis of the eIF2B HOM brains was conducted using untargeted RNAseq (Supplemental Figure 1B). Here, we found an enrichment of transcripts previously reported to be ISR dependent, namely Atf4, Chac1, Ddit3, Eif4ebp1, Ppp1r15a (Larhammar et al., 2017), Atf3, Asns, Mthfd2, Psat1, Sesn2, Slc1a5, Slc7a5, Slc7a11, Trib3 (Wong et al., 2019, Abbink et al., 2019).  These transcripts were assayed using targeted qPCR in the eIF2B HOM brains, spleen and PBMC (Supplemental Figure 1A, C, D) and in the retinas from the ONC experiments (Figure 2C). We have further clarified the analysis method for the gene expression data in the figure legends.

      We did not interrogate the proteome of the retina in the ONC model. Our study in this model was intended as a proof-of-concept evaluation of DNL343 effects in this acute ISR-dependent model of neurodegeneration. To this end, we performed gene expression (Figure 2C) and immunofluorescence analyses (Figure 2D-F). Each of these analyses were conducted using dedicated whole retinas; conducting additional protein analyses would necessitate a separate cohort of animals.

      We believe that heatmaps provide the best visualization of the data, particularly the dose dependent effects of DNL343 on multiple genes, but we understand the value for also providing statistical analyses. To address this, we provide additional Supplemental tables to show the outcome of statistical analyses undertaken. Statistical data relating to Figure 2C can be found on new Supplemental Tables 1 & 2; those relating to Supplemental Figures 1A, C, and D on new Supplemental Tables 3, 5, 6, respectively; that from Figure 4D on new Supplemental Table 8, and that from Figure 7D on new Supplemental Table 11.

      (3) Both the authors and Wong et al. (eLife, 2019) performed transcriptomic analyses on HOM mice. How do the authors compare the two data sets? Are they the same?

      In this work, transcriptomic approach was applied to confirm induction of ISR response in our in vivo model. While data are not identical, all of the top annotated genes shown in supplementary figure 1B were also deemed to be significant by Wong and coworkers (Bayes factor > 10). More importantly, as explained in our responses to question #2 from reviewer 3,  ISR genes highlighted in supplementary Figure 1B were also confirmed in two other studies (Larhammar et al., 2017, Abbink et al., 2019). These data support our interpretation that eIF2B HOM have elevated ISR relative to WT mice. We have added new text to line 164 on page 5 to clarify this point.

      (4) Can the authors interpret their omic data using volcano plots for HOM rescue experiments, as Wong et al. did in eLife 2019? Heat maps with statistical analyses are more straightforward to comprehend. Can the authors verify some of these data using RT-PCR, Western blot, etc.?

      We added additional pathway interpretation in our Figure 3 and 5 to highlight key biological processes altered in the brain and cellular compartment origin of CSF proteins changed in eIF2B HOM at baseline and following treatment with DNL343. Our treatment designed employed multiple dosing levels and as such, summarization by volcano plot would have resulted in creation of many figures that can be more easily captured by a single heat map plot. However, to provide additional quantitative information, we now added supplementary tables showing full statistical analysis for all heat maps for added clarity and transparency.

      We demonstrated 100% correlation between the select genes we examined by qPCR in supplemental Figure 1A and those identified from brain by RNA-seq. In addition, question of reliability of RNA-seq data has been previously been examined in great detail (Everaet et al, Sci Rep 2017) and found ~85% concordance between RNA-seq and qPCR data and those that were discordant tended to have < 2 log2FC and were present in low abundance. Given that top core ISR genes identified in our study have >2 log2FC and have been verified by other independent labs (Larhammar et al., 2017, Abbink et al., 2019, Wong et al., 2019). Based on these, we do not think that there is a rationale need for technical confirmation of RNAseq data.

      Risks for mis-annotation of proteins in TMT data were further mitigated by removing protein with coverage < 20% and having less than 8 unique peptides detected and setting protein annotation FDR to <1%.

      Additionally, TMT-labelling based proteomics offers wider dynamic range and sensitivity than western blotting. Validation of TMT logFC data with western blot technique, which is less quantitative and has lower dynamic ranges of detection may not be very informative. Furthermore, similar trends of changes in key ISR genes and proteins shown in figures 4D and 5A (e.g PSAT, SLC7A11, SLC7A5) provides additional support for the authenticity of proteins identified in this work.

      Also, for Figures 4E and F, it is assumed that each line represents an individual animal, but why their body weight gains are so different for the wild type? Can the authors plot the mean and s.e.m.? Also, there are no data about neurodegeneration. The authors need to show microscopy images, count the numbers, and assess the morphology of nerve cells.

      The large data spread in the body weight gain in our wild-type mice reflect the normal variability of this endpoint which can be influenced by sex and age. Indeed, both factors are present in our cohorts as animals of both sexes were included and there was a 7-week age-range (10-17 weeks of age at dosing start). Each line in Figures 4E-F indeed represents data sampled from individual animal over time. We chose to represent the data this way for transparency and have provided additional visualization (new Supplemental Figure 3) showing both body weight gain and plasma Nf-L levels as mean ± SEM as requested by this reviewer.

      In this study we chose to use a clinically-relevant biomarker of neurodegeneration, plasma neurofilament light chain (NfL) (Figure 4F). This allowed us to prioritize the tissue samples from these studies to execute comprehensive unbiased analyses for more complete characterization of the phenotype of these eIF2B LoF mice. NfL is a biomarker that has been recognized as a sensitive measurement of neuronal/axonal damage regardless of cause (Gaetani et al., 2018, Khalil et al., 2018). Elevated levels of plasma (and CSF) NfL levels has been demonstrated across neurodegenerative conditions such as Alzheimer’s disease (Giacomucci et al., 2022), multiple sclerosis (Ferreira-Atuesta et al., 2021), and in ALS (Huang et al., 2018).

      (5) How ISR is connected to metabolomic changes? Can the authors explain it?

      ISR caused significant increases in amino acid transporter and serine/glycine/1-carbon metabolism enzymes transcript and protein abundances that were highlighted in Figure 3A and C and lines 237-255 in the main text. Similar patterns were also observed in prior published studies (Larhammar et al., 2017, Abbink et al., 2019, Wong et al., 2019). Consistent with these changes we observed increased levels of Alanine (transported by SLC3A2, SLC7A11, SLC7A3) and decreased cystathionine levels (associated with increased expression of CTH).  ATF4 is one of the main orchestrator of ISR response to stress (e.g., amino acid deprivation) and it is required for expression of amino acid transporters and enzymes required for synthesis non-essential amino acids (PMID: 28494858). ATF4 increases cellular amino acid uptake and deliver AA needed for synthesis of proteins and glutathione needed for survival.

      We also observed prominent changes in CE in eIF2B HOM and its normalization with DNL343 treatment shown in Figure 5C. We checked for changes in expression levels of CEL, CES1, LCAT, LIPA, SOAT1, and NCEH1 proteins involved in CE metabolism and failed to detect any changes in protein or RNA abundances.  This  suggests that a rapid demyelination is a more likely trigger for CE accumulation as reported in FTD-GRN (Marian OC et al., 2023 acta neuropathol commun 11, 52), and in experimental demyelination models (Nugent AA et al., 2020 Neuron). We have added new text to the discussion section of the manuscript page 9, lines 408-411 to discuss how these results relate to each other.

      (6) It is hard to understand the biomarker part. The authors said "potential translational biomarkers are elevated..." Do the authors mean they are elevated so they can be potential biomarkers? If their levels are unchanged (e.g., TIMP-1), how can they be biomarkers? Also, this part needs a conclusion/summary. Also, what does "reversed biomarkers..." mean?

      We have modified the text to clarify and included a concluding sentence for this section of the results (page 7, lines 297-299). In assessing whether a given protein could be a potential translational biomarker for human disease we evaluated if the following two conditions were met: (1) Increased or decreased gene expression or protein levels of the biomarker in the brain or biofluids (CSF or plasma) of Eif2b5 R191H homozygote mice relative to wild-type controls that is modulated or normalized by administration of DNL343 and (2) protein levels in biofluids from VWMD patients that show differential levels than healthy controls in the same directionality as what is seen in the mouse model. GDF-15, GFAP, and NfL meet these criteria, but TIMP-1 and MCP-1 do not.

      Minor concerns:

      (1) Please explain which multiple comparison tests the authors used.

      This information has been further clarified in the figure legends.

      (2) Administrating the drug at an advanced stage led to a trend of NfL reduction but did not rescue function. Can the authors discuss what this means?

      Further elaboration and discussion about this finding have been added to the results section on page 8, line 319-325.

      (3) For statistical analyses on the bar graphs, it would be better if the authors labeled the comparison pairs on the graphs.

      We agree that labelling comparisons in bar graphs could aid the readership and have added this modification. Additionally, comparisons are indicated in the figure legend.

      (4) The authors need to state clearly that 2BAct's cardiovascular toxicity was observed in dogs, not mice. The current study does not exclude similar DNL343 toxicity. However, previous clinical trials suggest that DNL343 may be safe for humans.

      The suggestion to specify cardiovascular toxicity in dogs has been added (page 3, line 101), thank you. We now include a citation to Craig et al J Med Chem (2024) that provides evidence in a non-human primate (cynomolgus monkey) model that DNL343 dosing did not result in QT prolongation or any functional cardiac changes. We have also completed a Phase 1 (NCT04268784) and Phase 1B double-blind (NCT05006352) trials in healthy and ALS participants, respectively and now include reference to these trials on page 4, lines 102-104. The safety profile observed in these clinical studies supported further development of DNL343 for ALS in the Healey Platform trial (NCT04297683, Regimen G).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank Reviewer #1 for the assessment of our study.

      Reviewer #2:

      The authors should use DF/F to quantify over time the calcium response in photoreceptors. Furthermore, they should show that there is no concern of motion artifact when the pressure changes - as it could be a concern”.

      We used the ΔR/R measure (as defined in Böhm et al. 2016) to correct for motion artifacts due to the larvae moving out of the focal plane at the onset of pressure stimulation. This measure calculates the ratio of the GCaMP signal and a reference fluorescent signal (tdTomato in our case). This ratiometric quantification can better correct for changes in fluorescence that are not related to changes in calcium concentration than the ΔF/F metric, which does not use an independent reference channel.

      The authors have not shown

      (1) how the off response to decrease of pressure is mediated

      (2) which receptor/channel mediates in photoreceptors the response to increased pressure,

      (3) nor how the integration of light and pressure information is integrated by photoreceptors in order to guide the behavior of the larvae.

      These points are beyond the scope of the study. However, if possible within a short time frame, it would be really interesting to find out whether conflicting stimuli or converging stimuli (light & pressure) can cancel each other out or synergize. In particular since the authors cite unpublished results in the discussion: "Our unpublished results indeed suggest that green light determines the direction of swimming and can override upward swimming induced by pressure, which only influences the speed of swimming (LABC and GJ, unpublished)." Showing in one panel this very cool phenomenon would be exciting & open tons of questions for the field.”

      We agree that investigating the interaction of light and pressure is a very exciting direction. However, doing it properly with the rigour we characterised pressure sensation here (across stages, pressure levels and genotypes) and phototaxis and UV avoidance in previous work (across stages, wavelengths, genotypes and stimulus direction; see Randel et al. 2014, Gühmann et al. 2015, Verasztó et al. 2018, Jokura et al. 2023) would require a separate in-depth study.

      We agree with points 1-3 regarding the limitations and mentioned these in the discussion.

      (1) Although we carried out pressure-release experiments to characterise in more detail the response to pressure OFF, our setup did not allow us to control pressure release as accurately as we could for pressure increase. Therefore, we decided not to address this aspect of the response in more detail in this study.

      “Upon a decrease in pressure, three-day-old (but not two-day-old) larvae also show an off-response characterised by downward swimming. We have not analysed in detail the neuronal mechanisms of this response but it may depend on an inverted activation of the cPRC circuit, as happens during UV avoidance (Jokura et al., 2023)”

      (2) We decided not to explore this important question in this study, due to the significant effort it would take to test the expression and function of potential candidate channels in pressure transduction mechanism. “The cellular and molecular mechanisms by which cPRCs sense and transduce changes in hydrostatic pressure deserve further enquiry. “ and “The molecular mechanisms of pressure detection remain unclear. Components of the phototransduction cascade may be involved in pressure sensation. Our results indicate that the ciliary opsin required for detecting UV light is not essential for pressure sensation.“ We hypothesise in the discussion that TRP channels may play a role in pressure transduction, due to their diversity, multiple modalities and participation in phototransduction cascades.

      (3) We considered that the complexity of this question merits a separate study, where both cues can be accurately titrated and temporally combined to dissect the mechanisms of sensory integration. We have therefore removed the sentence referring to the interaction of phototaxis and the pressure response from the discussion.

      “How UV and pressure signals are integrated by the cPRC and how other light responses such as phototaxis interact with pressure responses remain exciting avenues for future research.”

    1. Author response:

      We thank the reviewers for their positive evaluation and constructive comments.  In our revision, we will aim to improve the analysis of our existing data and perform new experiments to address questions raised by the reviewers. 

      Reviewer 1 found it interesting that Kdm6b-deletion in hippocampal dentate gyrus (DG) neural stem cells causes precocious neuronal differentiation, whereas in contrast, Kdm6b is required for the maturation of neural progenitors in the ventricular-subventricular zone (V-SVZ). In the submitted manuscript, we did not provide much insight into the differences in Kdm6b function in these two neural stem cell populations. We plan on performing new experiments and expanding on our prior V-SVZ studies in a way that allows a direct comparison to the analyses of the DG. We hope that the addition of this data will shed light on why Kdm6b-deletion produces such different phenotypes in postnatal neural stem cells of the mouse brain. 

      Reviewer 2 noted that our submitted manuscript lacked insight into how KDM6B regulates gene expression. In particular, this reviewer asked whether the function of KDM6B is mediated by its enzymatic activity. The CUT&RUN experiment in our manuscript revealed an increase in H3K27me3 levels at select neural maintenance genes in the DG of Kdm6b-deleted mice. However, we agree that this data is insufficient to assess the significance of KDM6B-mediated H3K27me3 demethylation in regulating the NSC transcriptome. To address this point, we are performing experiments that can directly test this mechanistic model of KDM6B function and answer the question of whether the H3K27me3 demethylase activity of KDM6B is required for its ability to activate transcription.  Reviewer 2 also had a specific question about the cell types observed in the developing hippocampus after Kdm6b-deletion, and we believe that additional analyses will provide clarity to the overall phenotype.  More generally, we will aim to improve data quality and visualization. 

      Reviewer 3 raised the concern that because Kdm6b is not exclusively expressed in neural stem cells, the phenotype of precocious neuronal differentiation in mice with Kdm6b-deletion driven by the hGFAP-Cre transgene may be indirect, such as through changes in mature glial populations.  We will study the mature glia, as suggested by the reviewer.  We will also more thoroughly describe how our experiments targeting Kdm6b-deletion to adult neural stem cells with the tamoxifen-inducible Nestin-CreER provide evidence for the precocious neuronal differentiation phenotype being cell autonomous, at least in adult mice.  Reviewer 3 also had helpful suggestions for analyzing our single-cell RNA-seq data and behavioral studies, and we will address these comments in the revision. 

      Again, we thank the editors and reviewers for their time and consideration.  We believe that our manuscript will be greatly improved through this review process and hope to construct a stronger understanding of the role of KDM6B in DG neurogenesis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In the revised manuscript we have included an additional study that significantly contributes to the conclusions and models of the original version. Briefly, Figure 3 now describes our characterization of the diaphragm and laryngeal muscle activities (electromyography, EMG) during endogenous vocalizations. These EMGs also serve as representations of the brainstem breathing central pattern generator (CPG) inspiratory and post-inspiratory generating neurons, respectively. In our original submission, we found that many of the vocalizations had changes in pitch that mirrored the change in expiratory airflow (we termed positive intonation), and we proposed that the coordination of breathing muscles (like the inspiratory muscles) and larynx patterned this. This mechanism is akin to our findings for how neonatal cries are rhythmically timed and produced (Wei et al. 2022). The newly presented EMG data re-inforces this idea. We found that for vocalizations with positive intonation, the inspiratory diaphragm muscle has an ectopic burst(s) of activity during the expiration phase which corresponds to a decrease in airflow and pitch, and this is followed by laryngeal muscle activity and increased pitch. This can be cycled throughout the expiration to produce complex vocalizations with oscillations in pitch. A basal breath is hardwired for the laryngeal muscle activity to follow the diaphragm, so the re-cycling of this pattern nested within an expiration (a ‘mini-breath’ in a ‘breath’) demonstrates that the vocalization patterning system engages the entire breathing CPG. This contrasts with the canonical model that activity of the laryngeal premotor neurons control all aspects of producing / patterning vocalizations. Furthermore, this mechanism is exactly how the iRO produces and patterns neonatal vocalizations (Wei et al. 2022) and motivates the likely use of the iRO in adult vocalizations.

      Response to recommendations for the authors:

      Reviewer #1:

      (1) The authors should note in the Discussion that the cellular and circuit mechanisms by which the vocalization pattern generator integrates with the respiratory pattern generator to control expiratory airflow have not been fully worked out, requiring future studies.

      This was noted in the discussion section “The iRO likely patterns intonation for endogenous phonation”.

      (2) Please change the labeling of the last supplemental figure to Figure Supplemental 5.

      Thank you for identifying this.

      Reviewer #2:

      Major concerns

      (1) While it is true that modulation of activity in RAm modulates the laryngeal opening, this statement is an incomplete summary of prior work. Previous studies (Hartmann et al., 2020; Zhang et al., 1992, 1995) found that activation of RAm elicits not just laryngeal adduction but also the production of vocal sounds, albeit vocal sounds that were spectrally dissimilar from speciestypical vocalizations. Moreover, a recent study/preprint that used an activity-dependent labeling approach in mice to optogenetically activate RAm neurons that were active during USV production found that re-activation of these neurons elicits USVs that are acoustically similar to natural USVs (Park et al., 2023). While the authors might not be required to cite that recent preprint (as it is not yet peer-reviewed), the fact that activation of RAm elicits vocal sounds is clear evidence that its effects go beyond modulating the size of the laryngeal opening, as this alone would not result in sound production (i.e., RAm activation must also recruit expiratory airflow). The authors should include these relevant studies in their Introduction. Moreover, the rationale for the model proposed by the authors (that RAm controls laryngeal opening whereas iRO controls expiratory airflow) is unclear with regard to these prior studies. The authors should include a discussion of how these prior findings are consistent with their model (as presented in the Introduction, as well as in Figure 4 and relevant Discussion) that RAm modulates the size of laryngeal opening but not expiratory airflow.

      An introduction and discussion of the Veerakumar et. al. 2023 and Park et. al. 2024 manuscripts describing RAm in mice has now been included.

      The iRO serves to coordinate the breath airflow and laryngeal adduction to produce sound and the intonation within it that mirrors the breath airflow. This occurs because the iRO can control the breathing CPG (synaptic input to the preBötC inspiratory pacemaker) and is premotor to multiple laryngeal muscles (Wei et. al. 2022). The modulation of the expiratory airflow is by inducing momentary contraction of the diaphragm (via excitation of the preBötC) which opposes (a.k.a. slows) expiration. This change in flow results in a decrease in pitch (Fig. 3 in the revised manuscript, Wei et. al. 2022).

      It is our understanding that the basic model for RAm evoked USVs is that RAm evokes laryngeal adduction (and presumed abdominal expiratory muscle activation) and this activity is momentarily stopped during the breath inspiration by inhibition from the preBötC (Park et. al. 2024). So, in this basic model, any change in pitch and expiratory airflow would be controlled by tuning RAm activity (i.e., extent of laryngeal adduction). In this case, the iRO induced inspiratory muscle activity should not occur during expiration, which is not so (Fig. 3). Note, the activity of abdominal expiratory muscles during endogenous and RAm evoked USVs has not been characterized, so the contribution of active expiration remains uncertain. This is an important next step.

      We have now included a discussion of this topic which emphasizes that iRO and RAm likely have reciprocal interactions (supported by the evidence of this anatomical structure). These interactions would explain why excitation of either group can evoke USVs and, perhaps, the extent that either group contributes to a USV explains how the pitch / airflow changes. An important future experiment will be to determine the sufficiency of each site in the absence of the other.

      (2) The authors provide evidence that the relationship between expiratory airflow and USV pitch is variable (sometimes positive, sometimes negative, and sometimes not related). While the representative spectrograms clearly show examples of all three relationship types, no statistical analyses are included to evaluate whether the relationship between expiratory airflow and USV pitch is different than what one would expect by chance. For example, if USV pitch were actually unrelated to expiratory airflow, one might nonetheless expect spurious periods of positive and negative relationships. The lack of statistical analyses to explicitly compare the observed data to a null model makes it difficult to fully evaluate to what extent the evidence provided by the authors supports their claims.

      We have now included two null distributions and compared our observed correlation values to these. The two distributions were created by taking each USV / airflow pair and randomly shuffling either the normalized USV pitch values (pitch shuffled) or the normalized airflow values (airflow shuffled) to simulate the distribution of data should no relationship exist between the USV pitch and airflow.

      (3) The relationship between expiratory airflow and USV pitch comes with two important caveats that should be described in the manuscript. First, even in USV types with an overall positive relationship between expiratory airflow and pitch contour, the relationship appears to be relative rather than absolute. For example, in Fig. 2E, both the second and third portions of the illustrated two-step USV have a positive relationship (pitch goes down as expiratory airflow goes down). Nonetheless, the absolute pitch of the third portion of that USV is higher than the second portion, and yet the absolute expiratory airflow is lower. The authors should include an analysis or description of whether the relationship between expiratory airflow and USV pitch is relative vs.

      absolute during periods of 'positive intonation'.

      The relationship between pitch and airflow is relative and this in now clarified in the text. To determine this, we visualized the relationship between the two variables by scatterplot for each of the USVs syllables and, as the reviewer notes, a given airflow cannot predict the resulting frequency and vice versa.

      (4) A second important caveat of the relationship between expiratory airflow and USV pitch is  that changes in expiratory airflow do not appear to account for the pitch jumps that characterize mouse USVs (this lack of relationship also seems clear from the example shown in Fig. 2E). This caveat should also be stated explicitly.

      The pitch jumps do not have a corresponding fluctuation in airflow, and this is now stated in the results and discussion.

      (5) The authors report that the mode of relationship between expiratory airflow and USV pitch (positive intonation, negative intonation, or no relationship) can change within a single USV. Have the authors considered/analyzed whether the timing of such changes in the mode of relationship coincides with pitch jumps? Perhaps this isn’t the case, but consideration of the question would be a valuable addition to the manuscript.

      We analyzed a subset of USVs with pitch jumps that were defined by a change >10 kHz, at least 5ms long, and had one or two jumps. The intonation relationships between the sub-syllables within a USV type were not stereotyped as evidenced by the same syllable being composed of combinations of both modes.

      (6) The authors incorrectly state that PAG neurons important for USV production have been localized to the ventrolateral PAG. Tschida et al., 2019 report that PAG-USV neurons are located predominantly in the lateral PAG and to a lesser extent in the ventrolateral PAG (see Fig. 5A from that paper). The finding that iRO neurons receive input from VGlut2+ ventrolateral PAG neurons represents somewhat weak evidence that these neurons reside downstream of PAG-USV neurons. This claim would be strengthened by the inclusion of FOS staining (following USV production), to assess whether the Vglut+ ventrolateral PAG neurons that provide input to iRO are active in association with USV production.

      This comment correctly critiques that our PAG à iRO tracing does not demonstrate that the labeled PAG neurons are sufficient nor necessary for vocalization. Directly demonstrating that activation and inhibition the PAG-iRO labeled neurons ectopically drives or prevents endogenous USVs is an important next step. While FOS implies this connectivity, it does not definitely establish it and so this experiment is impacted by some of the caveats of our tracing (e.g. PAG neurons that drive sniffing might be erroneously attributed to vocalization).

      Our reading of the literature could not identify an exact anatomical location within the mouse PAG and this site appears to vary within a study and between independent studies (like within and between Tschida et. al. 2019 and Chen et. al. 2021). The labeling we observed aligns with some examples provided in these manuscripts and with the data reported for the retrograde tracing from RAm (Tschida et al 2019).

      (7) In Figure S5A, the authors show that USVs are elicited by optogenetic activation of iRO neurons during periods of expiration. In that spectrogram, it also appears that vocalizations were elicited during inspiration. Are these the broadband vocalizations that the authors refer to in the Results? Regardless, if optogenetic activation of iRO neurons in some cases elicits vocalization both during inspiration and during expiration, this should be described and analyzed in the manuscript.

      The sound observed on the spectrogram during inspiration is an artefact of laser evoked head movements that resulted in the fiber cable colliding with the plethysmography chamber. In fact, tapping an empty chamber yields the same broad band spectrogram signal. The evoked USV or harmonic band vocalization is distinct from this artefact and highlighted in pink.

      (8) Related to the comment above, the authors mention briefly that iRO activation can elicit broadband vocalizations, but no details are provided. The authors should provide a more detailed account of this finding.

      The broadband harmonic vocalizations we sometimes observe upon optogenetic stimulation of AAV-ChR2 expressing iRO neurons are akin to those previously described within the mouse vocal repertoire (see Grimsley et. al .2011). We have added this citation and mentioned this within the text. 

      (9) The effects of iRO stimulation differ in a couple of interesting ways from the effects of PAGUSV activation. Optogenetic activation of PAG-USV neurons was not found to entrain respiration or to alter the ongoing respiratory rate and instead resulted in the elicitation of USVs at times when laser stimulation overlapped with expiration. In contrast, iRO stimulation increases and entrains respiratory rate, increases expiratory and inspiratory airflow, and elicits USV production (and also potentially vocalization during inspiration, as queried in the comment above). It would be informative for the authors to add some discussion/interpretation of these differences.

      We have added a section of discussion to describe the how these different results may be explained by the iRO being a vocal pattern generator versus the PAG as a ‘gating’ signal to turn on the medullary vocalization patterning system (iRO and RAm). See discussion section ‘The iRO likely patterns intonation for endogenous phonation’.

      (10) The analysis shown in Fig. 4D is not sufficient to support the author’s conclusion that all USV types elicited by iRO activation are biased to have more positive relationships between pitch and expiratory airflow. The increase in the relative abundance of down fm USVs in the opto condition could account for the average increase in positive relationship when this relationship is considered across all USV types in a pooled fashion. The authors should consider whether each USV type exhibits a positive bias. Although such a comparison is shown visually in Fig. 4G, no statistics are provided. All 7 USV types elicited by optogenetic activation of iRO should be considered collectively in this analysis (rather than only the 5 types currently plotted in Fig. 4G).

      In the original submission the statistical analysis of r values between opto and endogenous conditions was included in the figure legend (‘panels E-G, two-way ANOVA with Sidak’s post-hoc test for two-way comparisons was used; all p-values > 0.05), and this has not changed in the revised manuscript. We have now provided the suggested comparison of opto vs endogenous USVs without down fm (Fig. 5D). This positive shift in r is statistically significant (…).

      (11) The evidence that supports the author’s model that iRO preferentially regulates airflow and that RAm preferentially regulates laryngeal adduction is unclear. The current study finds that activation of iRO increases expiratory (and inspiratory) airflow and also elicits USVs, which means that iRO activation must also recruit laryngeal adduction to some extent. As the authors hypothesize, this could be achieved by recruitment of RAm through iRO’s axonal projections to that region.

      Note, it is more likely that iRO is directly recruiting laryngeal adduction as they are premotor to multiple laryngeal muscles like the thyroarytenoid and cricothyroid (Wei et. al. 2022). The ‘Discussion’ now includes our ideas for how the iRO and RAm likely interact to produce vocalizations.

      In the recent preprint from Fan Wang’s group (Park et al., 2023), those authors report that RAm is required for USV production in adults, and that activation of RAm elicits USVs that appear species-typical in their acoustic features and elicits laryngeal adduction (assessed directly via camera). Because RAm activation elicits USVs, though, it must by definition also recruits expiratory airflow. Can the authors add additional clarification of how the evidence at hand supports this distinction in function for iRO vs RAm?

      See response to ‘Major Concern #1”.

      Minor concerns 

      (1) The authors might consider modifying the manuscript title. At present, it primarily reflects the experiments in Figure 2.

      We have provided a title that we feel best reflects the major point of the manuscript. We hope that this simplicity enables it to be recognized by a broad audience of neuroscientists as well as specialists in vocalization and language.

      (2) The statement in the abstract that "patterns of pitch are used to create distinct 'words' is somewhat unclear. Distinct words are by and large defined by combinations of distinct phonemes. Are the authors referring to the use of "tonemes" in tonal languages? If so, a bit more explanation could be added to clarify this idea. This minor concern includes both the Abstract, as well as the first paragraph of the Introduction.

      We have clarified this line in the abstract to avoid the confusing comparison between mouse vocalizations and human speech. In the introduction we have expanded our explanation to clarify that variations in pitch are a component of spoken language that add additional meaning and depth to the underlying, phonemic structure. 

      (3) Multiple terms are used throughout the manuscript to refer to expiratory airflow: breath shape (in the title), breath pattern, deviations in exhalation, power of exhalation, exhalation strength, etc. Some of these terms are vague in meaning, and a consolidation of the language would improve the readability of the abstract and introduction.

      We have chosen a smaller selection of descriptive words to use when describing these breath features.

      (4) Similarly, "exhalation" and "expiration" are both used, and a consistent use of one term would help readability.

      See point 3.

      (5) In a couple of places in the manuscript, the authors seem to state that RAm contains both laryngeal premotor neurons as well as laryngeal motor neurons. This is not correct to our knowledge., but if we are mistaken, we would ask that the authors add the relevant references that report this finding.

      It is our understanding that the RAm is defined as the anatomical region consistent with the murine rostral and caudal ventral respiratory groups composed of multiple premotor neuron pools to inspiratory, expiratory, laryngeal, and other orofacial muscles. This is supported by neurons within RAm that reflect multiple phases of the inspiratory and expiratory cycle (Subramanian et. al. 2018) and excitation of sub-regions within RAm modulating multiple parts of the breathing control system (Subramanian et. al. 2018 and Subramanian 2009). Rabies tracing of the various premotor neurons which define the anatomical region of RAm in the mouse shows that they surround the motor neurons in the loose region of the nucleus ambiguus (the anatomical location of RAm) for multiple muscles of the upper airway system, such as the thyroarytenoid (Wu et. al. 2017, Dempsey et. al. 2021 and Wei et. al. 2022). Given that the name RAm reflects a broad anatomical location, we have used it to describe both the premotor and motor neurons embedded within it. We have now clarified this in the text.

      (6) The statistical analysis applied in Figure 1C is somewhat confusing. The authors show two distributions that appear different but report a p-value of 0.98. Was the analysis performed on the mean value of the distributions for each animal, the median, etc.? If each animal has two values (one for USV+ breaths and one for USV- breaths), why not instead compare those with a paired t-test (or Wilcoxon rank sign)? Additional information is needed to understand how this analysis was performed.

      The original manuscript version used a two-way anova to compare the normalized histogram of instantaneous frequency for breaths with (USV+) or without (USV-) for each animal (first factor: USV+/-, second factor: Frequency). The p-value for the first factor (USV) was 0.98 showing no statistically significant effect of USV on the distribution of the histogram.

      For simplicity, we have instead performed the analysis as suggested and include a bar graph. This analysis shows that the instantaneous frequency of USV breaths is, in fact, statistically significantly lower than those without USVs. We have updated the figure legend and text to reflect this.

      (7) The use of the word "syllable" to describe parts of a USV that are produced on a single breath may be confusing to some scientists working on rodent USVs. The term 'syllable' is typically used to describe the entirety of a USV, and the authors appear to use the term to describe parts of a USV that are separated by pitch jumps. The authors might consider calling these parts of USVs "sub-syllables".

      We have clarified these descriptions throughout the text. We now refer to the categories as ‘syllable types’, define ‘syllables’ as ‘a continuous USV event’ with no more than 20ms of silence within and finally ‘sub-syllables’ to refer to components of the syllable separated by jumps in frequency (but not gaps in time).

      (8) In Figure S3, final row, the authors show a USV produced on a single breath that contains two components separated by a silent period. This type of bi-syllabic USV may be rare in adults and is similar to what the authors showed in their previous work in pups (multiple USVs produced on a single expiration, separated by mini-inspirations). One might assume that the appearance of such USVs in pups and their later reduction in frequency represents a maturation of vocalrespiratory coordination. Nonetheless, the appearance of bi-syllabic USVs has not been reported in adult mice to our knowledge, and the authors might consider further highlighting this finding.

      We were also struck by the similarity of these USVs to our study in neonates and such types of similarities sparked an interest in the role of the iRO in patterning adult USVs. We now include a description of the presence and abundance of bi- and tri-syllablic calls observed in our recordings to highlight this finding.

      (9) Figure 4 is referenced at the end of the second Results section, but it would seem that the authors intended to reference Figure 2. 

      For simplicity we included some of the referenced data within Fig. S5. We appreciate the recommendation.

      (10) In the optogenetic stimulation experiments, the authors should clarify why bilateral stimulation was applied. Was unilateral stimulation ineffective or less effective? The rationale provided for the use of bilateral stimulation (to further localize neural activation) is unclear.

      The iRO is bilateral and, we presume, functions similarly. So, we attempted to maximally stimulate the system. We have clarified this in the methods.

      (11) Figure Supplemental '6' should be '5'.

      Thanks!

      (12) Last sentence of the Introduction: "Lasty" should be "lastly".

      Thanks!

      (13) There are two references for Hage et al., 2009. These should be distinguished as 2009a and 2009b for clarity.

      Thanks!

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the reviewers and editor for their careful review of our work. We believe the resulting manuscript is much stronger. We agree with the comments made by Reviewer #2 regarding additional histology and neuronal data analysis, which will be presented in subsequent work.


      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Weaknesses):

      It was not always clear what the lesion size was. This information is important for future applica- tions, for example, in the visual cortex, where neurons are organized in retinotopy patterns.

      We thank the reviewer for this feedback. While there is some variation in lesion volume for a given parameter set, we have added more details of the volumes of lesions created in our testing (Fig. 4 and Fig. 5).

      It would be helpful if the author could add some discussion about whether and how this method could be used in other types of array/multi-contact electrodes, such as passive neuropixels, S- probes, and so on. In addition, though an op-amp was used in the design, it would still be helpful if the author could provide a recommended range for the impedance of the electrodes.

      We thank the reviewer for this suggestion. We have both added a demonstration of use in a differ- ent multielectrode probe type (with a U-probe) in Fig. 8, and we have added a discussion about which types of multielectrode probes would be suitable on Page 15, Line 420.

      “We demonstrated that our electrolytic lesioning technique works with a linear multicontact probe by testing with a U-Probe in ex vivo rabbit cortex. There are no particular limitations that would prevent our specific electrolytic lesioning technique and device from working with any passive multielectrode probe. The main requirements for use are that the probe has two electrodes that can directly (via whatever necessary adapters) connect to the lesioning device, such that arbitrary current can be passed into them as the anode and cathode. This would limit use of probes, like Neuropixels, where the on-chip acquisition and digitization circuitry generally precludes direct connection to electrodes [1], [2]. The impedance of the multielectrode probe should not be an issue, due to the use of an op amp. We showed use  with a Utah array (20-800 kΩ) and a U-Probe (1-1.5 MΩ). The specific op amp used here has a voltage range of ± 450 V, which assuming a desired output of 150 µA of current would limit electrode impedance to 6 MΩ. Though a different op amp could easily be used to accommodate a higher electrode impedance, it is unlikely that this would be necessary, since most electrodes have impedances between 100 kΩ to 1 MΩ [3].”

      Reviewer 2 (Public Weaknesses):

      In many of the figures, it is not clear what is shown and the analysis techniques are not well described.

      We thank the reviewer for this feedback. We hope that our edits to both the figures and the text have improved clarity for readers.

      The flexibility of lesioning/termination location is limited to the implantation site of the multielec- trode array, and thus less flexible compared to some of the other termination methods outlined in Appendix 2.

      We thank the reviewer for this point. You are right that the lesioning location is limited to the multielectrode array’s implantation site, while other methods in Appendix 2 do not require prox- imity of the lesion location and the electrophysiology recording site. However, we believe that the closeness of the lesioning location to the microelectrode array is a strength - guaranteeing record- ings from the perilesional area - even with the small negative of reduced flexibility. Multielectrode arrays can be implanted in many areas of cortex. If one wanted to study distal effects of a lesion, additional electrophysiology probes could be implanted to record from those areas. We have noted this on Page 3, Line 117.

      “While the link between the lesion location and the multielectrode location technically con- strains the lesion to an area of cortex in which a multielectrode array could be implanted, we see the connection as a positive, because it ensures recording some neuroelectrophysiology from the perilesional area in which recovery is hypothesized to occur (see Appendix 1Data Availabilityappendix.41).”

      Although the extent of the damage created through the Utah array will vary based on anatomical structures, it is unclear what is the range of lesion volumes that can be created with this method, given a parameter set. It was also mentioned that they performed a non-exhaustive parameter search for the applied current amplitude and duration (Table S1/S2) to generate the most suitable lesion size but did not present the resulting lesion sizes from these parameter sets listed. Moreover, there’s a lack of histological data suggesting that the lesion size is precise and repeatable given the same current duration/amplitude, at the same location.

      We thank the reviewer for this thoughtful feedback. We have added figures (Figs. 4 and 5), where we show the relationship between estimated lesion volume and the current amplitude and duration parameters. These figures include more data from the tests in Supplementary File 1 and Supplementary File 2. While there is some variation in lesion volume for a given current amplitude and duration, there is still a clear relationship between the parameters and lesion volume.

      It is unclear what type of behavioral deficits can result from an electrolytic lesion this size and type (∼3 mm in diameter) in rhesus macaques, as the extent of the neuronal loss within the damaged parenchyma can be different from past lesioning studies.

      While we appreciate the reviewer’s interest in the behavioral deficits associated with our lesions in rhesus macaques, reporting these falls beyond the scope of this manuscript. Future work will explore the behavioral deficits associated with these lesions

      The lesioning procedure was performed in Monkey F while sedated, but no data was presented for Monkey F in terms of lesioning parameters, lesion size, recorded electrophysiology, histological, or behavioral outcomes. It is also unclear if Monkey F was in a terminal study.

      We apologize for not being more explicit about the parameters used for the lesion in Monkey F. We have added this in Results on Page 5, Line 209 and in Methods on Page 19, Line 586.

      “After this validation and refinement, one proof-of-concept lesion (150 µA direct current passed through adjacent electrodes for 45 seconds) was performed in an in vivo sedated rhe- sus macaque (Monkey F) in order to validate the safety of the procedure.”

      “This lesion was created by applying 150 µA of direct current to two adjacent electrodes in the microelectrode array for 45 seconds.”

      We also clarified the parameters used for the other lesions in Monkeys H and U in Results on Page 7, Line 233 and in Methods on Page 19, Line 586.

      “In all of the fourteen lesions across two awake-behaving rhesus macaques (150 µA direct current passed through adjacent electrodes for 30 or 45 seconds (30s for Monkey U and 45s for Monkey H, except lesion H200120 which was for 50 seconds)), the current source worked as expected, providing a constant current throughout the duration of the procedure.”

      “In these lesions, 150 µA of direct current was applied to two adjacent electrodes in the mi- croelectrode array for 30 or 45 seconds (30s for Monkey U, 45s for Monkey H), except in lesion H200120 where current was applied for 50 seconds.”

      Monkey F was euthanized shortly after the lesion, so we now mention this on Page 19, Line 583.

      “Based on this, and a lack of physiological signs of pain from the anaesthetized pig studies, a lesion was performed on a sedated rhesus macaque who was subsequently euthanized due to unrelated health complications (Monkey F; 16 year-old adult, male rhesus macaque) in order to further verify safety before use in awake-behaving rhesus.”

      Because Monkey F was sedated and then euthanized shortly after, there is no behavioral data. As the lesion in sedated Monkey F was used to validate the safety of the procedure, any further data and analysis fall beyond the scope of this manuscript.

      As an inactivation method, the electrophysiology recording in Figure 5 only showed a change in pairwise comparisons of clustered action potential waveforms at each electrode (%match) but not a direct measure of neuronal pre and post-lesioning. More evidence is needed to suggest robust neuronal inactivation or termination in rhesus macaques after electrolytic lesioning. Some exam- ples of this can be showing the number of spike clusters identified each day, as well as analyzing local field potential and multi-unit activity.

      The reviewer has pointed out some short comings of the original analysis, which we believe have since been addressed with the revised analysis. LFP and spiking activity are functional measures that are more ambiguous in terms of loss and are also the subject of another manuscript currently under revision.

      The advantages over recently developed lesioning techniques are not clear and are not discussed.

      We thank the reviewer for noting this. We have added a section, also responding to their later request for us to compare our work to Khateeb et al. 2022, by adding a section to the Discussion on Page 16, Line 434.

      “Perhaps the most unique advantage of our technique in comparison with other existing inactivation methods lies in Design Consideration #1: stable electrophysiology pre- and post-inactivation (Appendix 1Data Availabilityappendix.41). While several methods exist that allow for localization and size control of the inactivation (Design Consideration #2) and cross compatibility across regions and species (Design Consideration #3), few have achieved compatibility with stable electrophysiology. For example, some studies record electrophysiology only after the creation of the lesion, preventing comparison with baseline neuronal activity [4]. One recent study, Khateeb, et al., 2022, developed an inactivation method that is effectively combined with stable electrophysiology by creating photothrombotic lesions through a chronic cranial window integrated with an electrocorticography (ECoG) array [5], which may be appropriate for applications where local field potential (LFP) recording is sufficient. This approach has trade-offs with regards to the three design considerations presented in Appendix 1Data Availabilityappendix.41.

      While Khateeb, et al., present a toolbox with integrated, stable electrophysiology from an ECoG array pre- and post- inactivation (Design Consideration #1), it demonstrated recordings from an ECoG array with limited spatial resolution. While a higher density ECoG array that would provide higher spatial resolution could be used, increasing the density of opaque electrodes might occlude optical penetration and constrain photothrombotic lesions. Further, ECoG arrays are limited to recording LFP, not electrophysiology at single neuron resolution, potentially missing meaningful changes in the neuronal population activity after lesioning. Khateeb, et al., demonstrated localization and control the size of inactivation (Design Consideration #2). In this manuscript, we have shown that the amount and duration of direct current are significant determinants of lesion size and shape, while with photothrombotic lesions, light intensity and aperture diameter are the significantly relevant parameters. One potential advantage of photothrombotic approaches is the use of optical tools to monitor anatomical and physiological changes after lesioning through the cranial window, though the research utility of this monitoring remains to be demonstrated.

      Although the method presented by Khateeb, et al., shows some cross-compatibility (Design Consideration #3), it has greater limitations in comparison with the method presented here. For example, while Khateeb, et al., notes that the approach could be adapted for use in smaller organisms, no modification is needed for use in other species with this work’s approach–so long as a multielectrode probe is implantable. In this manuscript we demon- strate electrolytic lesioning spanning two multielectrode probes across rabbits, pigs, sheep, and rhesus macaques, and our same device could be easily used with other smaller species, like rats, in which multielectrode probes have been successfully implanted [6]. Further, the approach in Khateeb, et al., is limited to superficial brain structures, due to the need for opti- cal accessibility. As noted, fiber optics could allow access to deeper structures, which would bring associated additional tissue damage, but deeper structure lesioning was not demon- strated. In contrast, the approach presented here can be used in any region of cortex in which a multielectrode probe can be implanted, which, depending on the probe used, does not limit it to surface structures. For example, we demonstrated use of our lesioning tech- nique with a linear U-probe (Fig. 8figure.caption.25), which could be used to reach deeper layers of cortex or specific deep cortical structures. In both techniques, the location of the lesion is tied to the location of the electrophysiology (for Khateeb et al., wherever the cra- nial window and ECoG array are; for this technique, wherever the multielectrode probe has been implanted), which ensures that the electrophysiology will include recordings from the perilesional area. Neither work addresses the potential of their technique to induce chronic post-lesion behavioral effects, which is a key goal for future work.”

      There is a lack of quantitative histological analysis of the change in neuronal morphology and loss.

      We appreciate the reviewer’s desire for a quantitative histological analysis, however this falls out- side of the scope of this manuscript. We are not attempting to make strong claims about the number of neurons lost through lesioning or thoroughly characterize morphological changes in the neurons. The histology is intended to show that lesioning did lead to a loss of neurons, but the precise num- ber of neurons lost is neither in scope nor is likely to be highly conserved across lesions.

      There is a lack of histology data across animals and on the reliability of their lesioning techniques across animals and experiments.

      We thank the reviewer for this point. As stated above, we have now added Fig. 4 and Fig. 5, which includes volume estimates based on the histology from more of our ex vivo and in vivo testing across animals.

      There is a lack of data on changes in cortical layers and structures across the lesioning and non- lesioning electrodes.

      We acknowledge that the histology does not have the level of detail that is expected from many modern studies. However, the goal here was dramatically different: we sought to calibrate a novel lesion device, ensure it’s safe use in large mammals (specifically, non-human primates) and pro- vide estimates of the lesion size to compare with the literature. The extent of histology that could be performed and the tools available to us prevent such an in depth analysis. We can say based on shank length of the Utah arrays used and known anatomy that we have affected layer 2/3 and maybe a bit of layer 4.

      Reviewer 1 (Recommendations For The Authors):

      Figure 5b. It would be helpful if the author could plot the delta match separately for the lesion elec- trodes, near neighbor electrodes, and far neighbors. This would help understand the lesion effect, specifically whether the effect is selective (e.g., more potent for the lesion and adjacent electrodes.)

      The fact that neuron loss is not particularly selective can already be seen in the spike waveform plots, arranged spatially on the array. Plenty of clear change is observed far from the lesion elec- trodes (marked with black dots) as well as nearby. We have made mention of this localized non- specificity in the main text and have ensured to remphasize in the figure legened. While a nice suggestion, we currently don’t feel this result rises to the level of a figure given it is not highly specific spatially.

      Reviewer 2 (Recommendations For The Authors):

      Overall the quality of the paper, the figures and the analysis used could be significantly improved. There is a lack of scientific rigor in the presentation of figures and analysis techniques. It is not clear what the authors are trying to communicate through the figures and their choice of figures to show is confusing (see below).

      We thank the reviewer for their pointed critiques and believe we have addressed their concerns with many changes to the text, a revamped waveforms analysis, and both the expansion and addition of results.

      The neurophysiology data shown doesn’t suggest neuronal loss, it only shows change which needs strong control data to show it is due to a lesion.

      As detailed below, we have presented a revised analysis that provides this control. While the reviewer is right to point out we can distinguish actual neuron loss from neuron silencing, we be- lieve the new analysis rigorously indicates new rates of sample turnover beyond those expected from healthy state.

      The histology figure should be replaced with a high-quality representation without folds.

      We understand the reviewer’s suggestion. While ideally we would have many histology slices from each lesion, due to cost, we were only able to collect one histology slice per lesion. The folds were introduced by the company that performed the H&E staining, and we unfortunately cannot remove the folds. Therefore, despite the folds, this is the best and only image from this lesion. We hope that the markings on the figure and the comment in the caption is sufficient to explain to readers that the folds are not a result of the lesion but instead a result of the histology process.

      The authors suggest that this lesioning method will be compatible with any available multielec- trode probe theoretically. Since all testing was done with a Utah array, it will be helpful to add an explanation about potential constraints that will make a given array compatible with this method.

      We thank the reviewer for this suggestion. As stated above, we have both added a demonstration of use in a different multielectrode probe type (with a U-probe) in Fig. 8, and we have added a discussion about which types of multielectrode probes would be suitable on Page 15, Line 420.

      The authors should cite and discuss previous studies using electrolytic lesioning in awake-behaving animals to study the causal connection between the brain and behavior. (One example study: Morissette MC, Boye SM. Electrolytic lesions of the habenula attenuate brain stimulation reward. Behavioural brain research. 2008 Feb 11;187(1):17-26.)

      We thank the reviewers for this suggestion. We have added a mention of existing electrolytic le- sioning studies on Page 2, Line 88.

      “Prior termination studies mostly measure behavioral output, with no simultaneous measures of neuronal activity during the behavior, impairing their ability to provide insight into the causal connection between the brain and behavior [7]–[11], or with no baseline (i.e., pre- lesion) measures of neuronal activity [4].”

      The authors should compare their technique with other recent lesioning studies in primates (e.g. Khateeb et al, 2022)

      We again thank the reviewer for this point. Specifically not mentioning Khateeb et al. 2022 was a submission error on our part; we cited the paper in Appendix 2 in the version uploaded to the eLife submission portal, but we had uploaded the version prior to citing it to bioRxiv. We have combined addressing this with addressing a previous comment, as mentioned above, with a section in the Discussion on Page 16, Line 434.

      In Appendix 2, the authors suggest that a major limitation of optogenetics and chemogenetic in- activation methods is the lack of rhesus-compatible constructs. However, several viral constructs have successful implementation in rhesus monkeys so far (e.g. Galvan A, Stauffer WR, Acker L, El-Shamayleh Y, Inoue KI, Ohayon S, Schmid MC. Nonhuman primate optogenetics: recent advances and future directions. Journal of Neuroscience. 2017 Nov 8;37(45):10894-903; Tremblay et al, Neuron 2020)

      We thank the reviewer for pointing us to these papers. We have added a more thorough description of what we meant by lack of rhesus-compatible constructs in that Appendix.

      “However, other challenges exist with using optogenetics as an inactivation method in nonhu- man primates, including difficulty reliably affecting behavior [12]. While several constructs for rhesus macaques have been developed [13], [14], reports of successfully inducing be- havioral effects have a small effect size and are less numerous than might be expected [12], and several null results have been published [15]–[17]. Other remaining challenges include the need to develop a head-mounted, battery powered light delivery system for multi-day delivery of light and difficulty integrating illumination with simultaneous chronic neuro- electrophysiology.”

      For Figure 5b, only pairwise comparison results from monkey U (L11-14) are shown. It is unclear why such results from monkey H were shown in Figure 5a but not in 5b.

      We thank the reviewer for pointing out this unconventional one monkey result. As described in the original submission, we previously omitted Monkey H from the analysis in Figure 5b (now Figure 7) since some of the lesions were closely spaced together, preventing well defined pre- and post- lesion rates of turnover. Never-the-less we have included Monkey H in all the revised analysis and believe even the less cleanly separated data shows useful indications of neuron loss or silencing evoked by the lesion.

      Behavioral data (during a motor task) from the awake behaving monkeys (U and H) would greatly strengthen the claim that this lesioning method is capable of creating a behavioral effect and can be adopted to study the relationship between neural function and behavior outcomes.

      While we are grateful for the reviewer’s interest in the application of our lesioning technique to studies involving behavior, a behavioral analysis of the effects of our electrolytic lesions falls be- yond the scope of this Tools and Resources manuscript. We would also like to point out that we do not claim that we have achieved a behavioral deficit in this manuscript.

      Figure 2 would benefit from an illustration of the Utah array placement and the location of the sites used for lesioning. The authors can either overlay the illustrations on the current ex-vivo and histology images or create a separate schematic to demonstrate that for the readers. Also, Figure 2B needs to be replaced with one without the folds to avoid confusion for the readers.

      We have added Figure 2 - figure supplement 1, which shows both the location within the Utah array of the two electrodes used to create the lesions as well as the relative size of the surface area of the lesion and the array. Unfortunately, as the lesion was created under the array, the exact location of the array relative to the lesion is unknown.

      As mentioned above, Figure 2B is the only histological image from that lesion. We hope that the markings in the image as well as the caption sufficiently explain that the folds are unrelated to the lesion itself.

      Figure 3, the conical region is not well delineated. Data across animals and lesion volume with respect to different parameters should be included.

      We have included a supplemental figure, Figure 3 - figure supplement 1, where we have used a dashed white line to clearly indicate the area of damaged parenchyma, in case it was not clear in Figure 3a. We have also added volume estimates from lesions across animals and different param- eters. The ex vivo estimates are shown in Figure 4 and the in vivo estimates are shown in Figure 5.

      Figure 4: it is not clear what is being communicated, and where the voltage traces are from.

      We thank the reviewer for noting this confusion. We have added some lines in the text to explain what the voltage traces show, both in the caption to Fig. 6 and in the text on Page 7, Line 238.

      “Traces only capture the values while the lesioning device was turned on (45 seconds for most lesions and 50 seconds for lesion H200120). A) Voltage traces. Discontinuity at the beginning of the traces indicates transient voltages that were too rapid to be captured by the voltmeter, lasting between 0.13 and 0.33 s. The fluctuating voltages, especially the rapid in- crease in voltage at the beginning of lesioning, emphasize the importance of using a current source to deliver consistent amounts of current into the brain.”

      “The voltage across the microelectrode array fluctuated much more than the current did, em- phasizing that we made the correct choice in using a current source to ensure delivery of consistent amounts of current into the brain (Fig. 6figure.caption.19).”

      Figure 5: why did the authors choose to use matching units as a measure of the lesion? It is surprising that there are still units on the location that the authors claim to be a lesion. To clarify that it would be helpful to show the location of the lesion in Figure 4a. Also, what can we conclude about the lesion induction when we see units on the lesion electrode? The change in unit match shows that there is a change in the network (although the authors need to show control for that so we know those changes don’t happen due to natural dynamics). It is not clear what is the time duration for pre-pre and post-post (i.e. minutes, seconds, hours). Do these comparisons come from the same time frame or are they coming from two fragments of time for both pre and post- conditions?

      Aside from post-mortem histology and tissue assays, there is no good way to confirm neuron loss with chronically implanted electrode arrays in nonhuman primates. Waveforms were chosen as they are the one readily isolated physical measure of the system we are injuring. Although functional measures of activity could indicate neuron loss (topic of following papers), there are many conceivable changes in firing rate patterns that could manifest spuriously as loss, making the estimation of loss even more ambiguous and challenging this way.

      We believe the new Figure 7 will make the procedure much more clear, while also providing the control requested by the reviewer, illustrating that new statistical categories of altered waveforms emerge during a lesion, beyond those associated with typical changes in waveform composition within multi-unit recordings seen during recording sample turnover fom healthy animals. We further note that by confining this analysis to four day spans at most, we have limited the impact of daily sample turnover described in the literature (Gallego, 2020).

      The time duration for pre-session versus pre-session (pre-post and post-post), is some multiple of the approximate 24 hours between each daily recording session. Therefore, since restricting our- selves to four days separation, between 24 and 96 hours. Spikes are sampled from successful trial periods (so on the order of seconds, compiled into minutes across the whole recording session). Although already described in the main text, these points have been reemphasized in the figure legend.

      CNO (line 931) needs to be explained.

      We thank the reviewer for this point. We have defined CNO and its relevance in Appendix 2.

      “Additionally, chronic inactivation over days may be logistically challenging, as the half life of clozapine N-oxide (CNO, a ligand used to activate DREADD receptors) is on the order of hours.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the spatial and temporal patterns of occurrence and the interspecific associations within a terrestrial mammalian community along human disturbance gradients. They conclude that human activity leads to a higher incidence of positive associations.

      Strengths:

      The theoretical framework of the study is brilliantly introduced. Solid data and sound methodology. This study is based on an extensive series of camera trap data. Good review of the literature on this topic.

      Weaknesses:

      The authors use the terms associations and interactions interchangeably.

      This is not the case. In fact, we state specifically that "... interspecific associations should not be directly interpreted as a signal of biotic interactions between pairs of species…" However, co-occurrence can be an important predictor of likely interactions, such as competition and predation. We stand by our original text.

      It is not clear what the authors mean by "associations". A brief clarification would be helpful.

      Our specific definition of what is meant here by spatial association can be found in the Methods section. To clarify, the calculation of the index of associations is based on the covariance for the two species of the residuals (epsilon) after consideration of all species-specific response to known environmental covariates. These covariances are modelled to allow them to vary with the level of human disturbance, measured as human presence and human modification. After normalization, the final index of association is a correlation value that varies between -1 (complete disassociation) and +1 (complete positive association).

      Also, the authors do not delve into the different types of association found in the study. A more ecological perspective explaining why certain species tend to exhibit negative associations and why others show the opposite pattern (and thus, can be used as indicator species) is missing.

      Suggesting the ecological underpinnings of the associations observed here would mainly be speculation at this point, but the associations demonstrated in this analysis do suggest promising areas for the more detailed research suggested.

      Also, the authors do not distinguish between significant (true) non-random associations and random associations. In my opinion, associations are those in which two species co-occur more or less than expected by chance. This is not well addressed in the present version of the manuscript.

      Results were considered to be non-random if correlation coefficients (for spatial association) or overlap (for temporal association) fell outside of 95% Confidence Intervals. This is now stated clearly in the Methods section.  In Figure 3—figure supplement 1-3 and Figure 4—figure supplement 1-3, p<0.01 levels are also presented.

      The obtained results support the conclusions of the study.

      Anthropogenic pressures can shape species associations by increasing spatial and temporal co-occurrence, but above a certain threshold, the positive influence of human activity in terms of species associations could be reverted. This study can stimulate further work in this direction.

      Reviewer #2 (Public Review):

      Summary:

      This study analyses camera trapping information on the occurrence of forest mammals along a gradient of human modification of the environment. The key hypotheses are that human disturbance squeezes wildlife into a smaller area or their activity into only part of the day, leading to increased co-occurrence under modification. The method used is joint species distribution modelling (JSDM).

      Strengths:

      The data source seems to be very nice, although since very little information is presented, this is hard to be sure of. Also, the JSDM approach is, in principle, a nice way of simultaneously analysing the data.

      Weaknesses:

      The manuscript suffers from a mismatch of hypotheses and methods at two different levels.

      (1) At the lower level, we first need to understand what the individual species do and "like" (their environmental niche). That information is not presented, and the methods suggest that the representation of each species in the JSDM is likely to be extremely poor.

      The response of each species to the environmental covariates provides a window into their environmental niche, encapsulated in the beta coefficients for each environmental covariate. This information is presented in Figure 2.

      (2) The hypothesis clearly asks for an analysis of the statistical interaction between human disturbance and co-occurrence. Yet, the model is not set up this way, and the authors thus do a lot of indirect exploration, rather than direct hypothesis testing.

      Our JSDM model is set up specifically to examine the effect of human disturbance on co-occurrence, after controlling for shared responses to environmental variables.  It directly tests the first hypothesis, since, if increase in indices of human disturbance had not tended to increase the measured spatial correlations between species as detected by the model, we would have rejected our stated hypothesis that human modification of habitats results in increased positive spatial associations between species.

      Even when the focus is not the individual species, but rather their association, we need to formulate what the expectation is. The hypotheses point towards presenting the spatial and the temporal niche, and how it changes, species for species, under human disturbance. To this, one can then add the layer of interspecific associations.

      Examining each species one by one and how each one responds to human disturbance would miss the effects of any meaningful interactions between species.  The analysis presented provides a means to highlight associations that would have been overlooked.  Future research could go on to analyze the strongest associations in the community and the strongest effects of human disturbance so as to uncover the underlying interactions that give rise to them and the mechanisms of human impact.  We believe that this will prove to be a much more productive approach than trying to tackle this problem species by species and pair by pair.

      The change in activity and space use can be analysed much simpler, by looking at the activity times and spatial distribution directly. It remains unclear what the contribution of the JSDM is, unless it is able to represent this activity and spatial information, and put it in a testable interaction with human disturbance.

      The topic is actually rather complicated. If biotic interactions change along the disturbance gradient, then observed data are already the outcome of such changed interactions. We thus cannot use the data to infer them! But we can show, for each species, that the habitat preferences change along the disturbance gradient - or not, as the case may be.

      Then, in the next step, one would have to formulate specific hypotheses about which species are likely to change their associations more, and which less (based e.g. on predator-prey or competitive interactions). The data and analyses presented do not answer any of these issues.

      We suggest that the so-called “simpler” approach described above is anything but simple, and this is precisely what the Joint Species Distribution Model improves upon.  As pointed out in the Introduction, simply examining spatial overlap is not enough to detect a signal of meaningful biotic interaction, since overlap could be the result of similar responses to environmental variables.  With the JSDM approach, this would not be considered a positive association and would then not imply the possible existence of meaningful interaction.

      Another more substantial point is that, according to my understanding of the methods, the per-species models are very inappropriate: the predictors are only linear, and there are no statistical interactions (L374). There is no conceivable species in the world whose niche would be described by such an oversimplified model.

      While interaction terms can be included in the JSDM, this would considerably increase the complexity of the models.  In previous work, we have found no strong evidence for the importance of interaction terms and they do not improve the performance of the models.

      We have no idea of even the most basic characteristics of the per-species models: prevalences, coefficient estimates, D2 of the model, and analysis of the temporal and spatial autocorrelation of the residuals, although they form the basis for the association analysis!

      The coefficient estimates for response to environmental variables used in the JSDM are provided in Figure 2 and Figure 2—source data 1.

      Why are times of day and day of the year not included as predictors IN INTERACTION with niche predictors and human disturbance, since they represent the temporal dimension on which niches are hypothesised to change?

      Also, all correlations among species should be shown for the raw data and for the model residuals: how much does that actually change and can thus be explained by the niche models?

      The discussion has little to add to the results. The complexity of the challenge (understanding a community-level response after accounting for species-level responses) is not met, and instead substantial room is given to general statements of how important this line of research is. I failed to see any advance in ecological understanding at the community level.

      We agree that the community-level response to human disturbance is a complex topic, and we believe it is also a very important one.  This research and its support of the spatial compression hypothesis, while not providing definitive answers to detailed mechanisms, opens up new lines of inquiry that makes it an important advance.  For example, the strong effects of human disturbance on certain associations that were detected here could now be examined with the kind of detailed species by species and pair by pair analysis that this reviewer appears to demand.

      Reviewer #1 (Recommendations For The Authors):

      L27 indicates instead of "idicates".

      We thank the reviewer for catching that error.

      L64 I would refer to potential interactions or just associations. It is always hard to provide evidence for the existence of true interactions.

      We have revised to “potential interactions” to qualify this statement.

      L69 Suggestion: distort instead of upset.

      We thank the reviewer for catching that error.

      L70-71 Here, authors use the term associations. Please, be consistent with the terminology throughout the manuscript.

      We thank the reviewer for raising this important point.  The term “co-occurrence” appears to be used inconsistently in the literature, so we have tried to refer to it only when referencing the work of us. For us, co-occurrence means “spatial overlap” without qualification as to whether it is caused by interaction or simply by similar responses to environmental factors (see Blanchet et al. 2020, Argument 1). In our view, interactions refer to biotic effects like predation, competition, commensalism, etc., while associations are the statistical footprint of these processes.   In keeping with this understanding, in Line 73, we changed "association" to the stronger word "interaction," but in Line 76, we keep the words "spatiotemporal association", which is presumed to be the result of those interactions. In Line 91, we have changed “interactions” to “associations,” as we do not believe interactions were demonstrated in that study. 

      L76 "Species associations are not necessarily fixed as positive or negative..." This sentence is misleading. I would say that species associations can vary across time and space, for instance along an environmental gradient.

      We thank the reviewer for pointing out the potential for confusion.  In Line 79, we have changed as suggested.

      L78 "Associations between free-ranging species are especially context-dependent" Loose sentence. Please, explain a bit further.

      We have changed the sentence to be more specific; ”Interactions are known to be context-dependent; for example, gradients in stress are associated with variation in the outcomes of pairwise species interactions.”

      L83-85 This would be a good place to introduce the 'stress gradient' hypothesis, which has also been applied to faunal communities in a few studies. According to this hypothesis, the incidence of positive associations should increase as environmental conditions harden.

      In our review of the literature, we find that the stress gradient hypothesis is somewhat controversial and does not receive strong support in vertebrates.  We have added the phrase “…the controversial stress-gradient hypothesis predicts that positive associations should increase as environmental conditions become more severe…”

      L86-88 Well, overall, the number of studies examining spatiotemporal associations in vertebrates is relatively small. That is, bird associations have not received much more attention than those of mammals. I find this introductory/appealing paragraph a bit rough. I think the authors can do better and find a better justification for their work.

      We thank the reviewer for the comments.  We have rewritten the paragraph extensively to make it clearer and to provide a stronger justification for the study.

      L106 "[...] resulting in increased positive spatial associations between species" I'd say that habitat shrinking would increase the level of species clustering or co-occurrence, but in my opinion, not necessarily the incidence of positive associations. It is not clear to me if the authors use positive associations as a term analogous to co-occurrence.

      We thank the reviewer for raising this very important distinction.  Habitat shrinking would increase levels of species co-occurrence, but this is not particularly interested.  We wanted to test whether there were effects on species interactions, as revealed by associations.  We find that the terms association and co-occurrence are used somewhat loosely in the literature and so have made some new effort to clarify and systematize this in the manuscript.  For example, there appear to be a differences in the way “co-occurrence” is used in Boron 2023 and in Blanchet 2020. We do not use the term "positive spatial association" as analogous to "spatial co-occurrence.". Spatial co-occurrence, which for us has the meaning of spatial overlap, could simply be the result of similar reactions to environmental co-variates, not reflecting any biotic interaction. Joint Species Distribution Models enable the partitioning of spatial overlap and segregation into that which can be explained by responses to known environmental factors, and that which cannot be explained and thus might be the result of biotic interactions.  It is only the latter that we are calling spatial association, which can be positive or negative.   These associations may be the statistical footprint of biotic interactions.

      Results:

      Difference between random and non-random association patterns. It is not clear to me if the reported associations are significant or not. The authors only report the sign of the association (either positive or negative) but do not clarify if these associations indicate that two species coexist more or less than expected by chance. In my opinion, that is the difference between true ecological associations (e.g., via facilitation or competition effects) and random co-existence patterns. This is paramount and should be addressed in a new version of the manuscript.

      This information is provided in Figure 3—figure supplement 1,2,3 and Figure 4—figure supplement 1,2,3.  This is referenced in the text as follows, “… correlation coefficients for 18 species pairs were positive and had a 95 % CI that did not overlap zero, and the number increased to 65 in moderate modifications but dropped to 29 at higher modifications" and so on. This criterion for significance (ie., greater than expected by chance) is now stated at the end of the Materials and methods.  In Figure 3—figure supplement 1,2,3 and Figure 4—figure supplement 1,2,3, those correlations that were significant at p<0.01 are also shown.

      I am also missing a more ecological explanation for the observed findings. For instance, the top-ranked species in terms of negative associations is the red fox, whereas the muntjac seems to be the species whose presence can be used as an indicator for that of other species. What are the mechanisms underlying these patterns? Do red foxes compete for food with other species? Do the species that show positive associations (red goral, muntjac) have traits or a diet that are more different from those of other species? More discussion on these aspects (role of traits and the trophic niche) would be necessary to better understand the obtained results.

      The purpose of this paper was to test the compression hypotheses, and we have tried to keep that as the focus.  However, the analysis does open up interesting lines of inquiry for future research to decipher the details of the interactions between species and the mechanisms by which human disturbance facilitates or disrupts these interactions. The reviewer raises some interesting possibilities, but at this point, any discussion along these lines would be largely speculation and could lengthen the paper without great benefit. 

      Reviewer #2 (Recommendations For The Authors):

      The manuscript should be accompanied by all data and code of analysis.

      All data and RScripts have been made available in Science Data Bank: https://doi.org/10.57760/sciencedb.11804.

      The sentence "not much is known" is weak: it suggests the authors did not bother to quantify what IS known, and simply waved any previous knowledge aside. Surely we have some ideas about who preys on whom, and which species have overlapping resource requirements (e.g., due to jaw width). For those, we would expect a particularly strong signal, if the association is indeed indicative of interactions.

      We believe that the reviewer is referring to the statement in Line 90-92 about the lack of understanding of the resilience of terrestrial mammal associations to human disturbance.  We have added a reference to one very recent publication that addresses the issue (Boron et al., 2023), but otherwise we stand by our statement. We have, however, added a qualifier to make it clear that we did indeed look for previous knowledge; "However, a review of the literature indicates that ...."

      Figures:

      Fig. 1. This reviewer considers that this is too trivial and should be deleted.

      This is a graphical statement of the hypotheses and may be helpful to some readers.

      Fig. 2. Using points with error bars hides any potential information.

      Done as suggested.

      That only 4 predictors are presented is unacceptably oversimplified.

      Only 4 predictors are included because, in previous work, we found that adding additional predictors or interactions did little to improve the model’s performance (Li et al. 2018, 2021 and 2022) and could lead to over-fitting.

      Fig. 5. and 6. aggregate extremely strongly over species; it remains unclear which species contribute to the signal, and I guess most do not.

      The number of detection events presented in Table 1 should help to clarify the relative contribution of each species to the data presented in Figures 5 and 6.

      This reviewer considers that the introduction 'oversells' the paper.

      L55: can you give any such "unique ecological information"

      L60: Lyons et al. (Kathleen is the first name) has been challenged by Telford et al. (2016 Nature) as methodologically flawed.

      The first name has been deleted.  The methodological flaw has to do with interpretation of the fossil record and choice of samples, not with the need to partition shared environmental preferences and interactions.

      L61 contradicts line 64: Blanchet et al. (2022, specifying some arguments from Dormann et al. 2018 GEB) correctly point out that logically one cannot infer the existence or strength from co-occurrence data. It is thus wrong to then claim (citing Boron et al.) that such data "convey key information about interactions". The latter statement is incorrect. A tree and a beetle can have extremely high association and nothing to do with each other. Association does not mean anything in itself. When two species are spatially and temporally non-overlapping, they can exhibit perfect "anti-association", yet, by the authors' own definition, cannot interact.

      We believe that the reviewer’s concerns arise from a misunderstanding of how we use the term association.  In our usage, an association is not the same as co-occurrence or overlap, which may simply be the result of shared responses to environmental variables.  The co-occurring tree and beetle would not be found to have any association in our analysis, only shared environmental sensitivities.  In contrast, associations can be the statistical footprint of interactions, and would be overlaid onto any overlap due to similar responses to the environment.  In the case of negative associations, such as might be the result of competitive exclusion or avoidance of predators, the two species would share environmental responses but show lower than expected spatial overlap.  Even though they might be only rarely found in the same vicinity, they would indeed be interacting when they were together.

      Joint Species Distribution Models "allow the partitioning of the observed correlation into that which can be explained by species responses to environmental factors... and that which remains unexplained after controlling for environmental effects and which may reflect biotic interactions." (Garcia Navas et al. 2021). It is the latter that we are calling “associations.”

      L63: Gilbert reference: Good to have a reference for this statement.

      This point is important, but the reviewer’s comments below have made it clear that it is even more important to point out that strong interactions should be expected to lead to significant associations.  We have added a statement to clarify this.

      L70-72: Incorrect, interactions play a role, not associations (which are merely statistical).

      In this, we agree, and we have revised the statement to refer to interactions, not associations. In our view, an interaction is a biological phenomenon, while an association is the resulting statistical signal that we can detect.

      L75: Associations tell us nothing, only interactions do. Since these can not be reliably inferred, this statement and this claim are wrong.

      We thank the reviewer for raising this point, but we beg to disagree. Strong interactions should be expected to lead to significant associations that can be detected in the data. Associations, which can be measured reliably, are the evidence of potential interactions, and hence associations can tell us a great deal.  We have added a note to this effect after the Gilbert reference above to clarify this point.

      However, we do accept that associations must be interpreted with caution. As Blanchet et al. 2020 explain, " …the co-occurrence signals (e.g. a significant positive or negative correlation value) estimated from these models could originate from any abiotic factors that impact species differently. Therefore, this correlation cannot be systematically interpreted as a signal of biotic interactions, as it could instead express potential non-measured environmental drivers (or combinations of them) that influence species distribution and co-distribution.”  Or alternatively an association could be the result of interaction with a 3rd species. 

      L87: Regarding your claim, how would you know you DO understand? For that, you need to formulate an expectation before looking at the data and then show you cannot show what you actually measure. (Jaynes called this the "mind-projection fallacy".)

      We are not sure if the reviewer is criticizing our paper or the entire field of community ecology.  Perhaps it is the statement that “….resilience of interspecific spatiotemporal associations of terrestrial mammals to human activity remains poorly understood….”  Since we are confident that the reviewer believes that mammals do interact, we guess that it is the term “association” that is questioned.  We have revised this to “…the impacts of human activity on interspecific interactions of terrestrial mammals remains poorly understood…” 

      In this particular case, we did formulate an expectation before looking at the data, in the form of the two formal hypotheses that are clearly stated in the Introduction and illustrated in Figure 1. If the hypotheses had not been supported, then we would have accepted that we do not understand. But as the data are consistent with the hypotheses, we submit that we do understand a bit more now.

    1. Author response:

      We thank the reviewers for their critical appraisal of our manuscript. We will address the points of confusion and/or lack of clarity in a revised manuscript. We agree with reviewer 1 that applying the best practice pipeline(s) on new experimental data and comparing this approach with current practices would be a useful demonstration of how this alters the biological interpretation. This is something we are in the process of completing but believe this is best addressed in a separate manuscript where we can focus on the associated biological findings, allowing this manuscript to remain focused on the accurate quantification of tRNA-Seq data.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Eaton et al. examine the regulation of transcription directionality using a powerful genomic approach (more about the methodology below). Their data challenge the notion that the polyadenylation signal-reading Cleavage and Polyadenylation (CPA) complex is responsible for controlling promoter directionality by terminating antisense transcription. Namely, depletion of the required CPA factor RBBP6 has little effect on antisense transcription measured by POINT. They find instead that initiation is intrinsically preferential in the sense direction and additionally maintained by the activities of an alternative processing complex called Integrator, together with the kinase CDK9. In the presence of CDK9 activity, depletion of Integrator endoribonuclease INTS11 leads to globally increased transcription in the antisense direction, and minor effects in the sense direction. However, CDK9 inhibition reveals that sense transcription is also sensitive to INS11 depletion. The authors suggest that CDK9 activity is stronger in the sense direction, preventing INTS11-mediated premature termination of sense transcrpts.

      Strengths:

      The combination of acute depletion of the studied factors using degron approaches (important to limit possible secondary effects), together with novel and very sensitive nascent transcriptomics methods POINT and sPOINT is very powerful. The applied spike-in normalization means the analysis is more rigorous than most. Using this methodology allowed the authors to revisit the interesting question of how promoter/transcription directionality is determined.

      The data quality appears very good and the fact that both global analysis as well as numerous gene-specific examples are shown makes it convincing.

      The manuscript is well written and hence a pleasure to read.

      We appreciate this positive assessment.

      Weaknesses:

      I am slightly worried about the reproducibility of the data - it is unclear to me from the manuscript if and which experiments were performed in replicate (lack of table with genomic experiments and GEO access, mentioned in more detail in below recommendations to authors), and the methods could be more detailed.

      All sequencing data was deposited with GEO. Multiple biological replicates were performed for each sequencing experiment.  Bigwig files are presented as a table in the GEO submissions. This data has now been made public.

      A separate discussion section would be useful, particularly since the data provided challenge some concepts in the field. How do the authors interpret U1 data from the Dreyfuss lab in light of their results? How about the known PAS-density directionality bias (more PAS present in antisense direction than in sense) - could the differential PAS density be still relevant to transcription directionality?

      As suggested, we have expanded our discussion to relate our findings to existing data. We think the results from the Dreyfuss lab are very important and highlight the role of U1 snRNA in enforcing transcriptional elongation.  It does this in part by shielding PAS sequences.  Recent work from our lab also shows that U1 snRNA opposes the Restrictor complex and PNUTS, which otherwise suppress transcription (Estell et al., Mol Cell 2023).  Most recently, the Adelman lab has demonstrated that U1 snRNA generally enhances transcription elongation (Mimoso and Adelman., Mol Cell 2023).  Our work does not challenge and is not inconsistent with these studies.

      The role of U1 in opposing PAS-dependent termination inspired the idea that antisense transcriptional termination may utilise PASs.  This was because such regions are rich in AAUAAA and comparatively poor in U1 binding sites. However, our RBBP6 depletion and POINT-seq data suggest that PAS-dependent termination is uncommon in the antisense direction. As such, other mechanisms suppress antisense transcription and influence promoter directionality. In our paper, we propose a major role for the Integrator complex.

      We do not completely rule out antisense PAS activity and discuss the prior work that identified polyadenylated antisense transcripts. Nevertheless, this was detected by oligo-dT primed RT-PCR/Northern blotting, which cannot determine the fraction of non-polyadenylated RNA that could result from PAS-independent termination (e.g. by Integrator).  To do that requires an analysis of total nascent transcription as achieved by our POINT-seq.  Based on these experiments, Integrator depletion has a greater impact on antisense transcription than RBBP6 depletion. 

      I find that the provided evidence for promoter directionality to be for the most part due to preferential initiation in the sense direction should be stressed more. This is in my eyes the strongest effect and is somehow brushed under the rug.

      We agree that this is an important finding and incorporated it into the title and abstract.  As the reviewer recommends, we now highlight it further in the new discussion.

      References 12-17 report an effect of Integrator on 5' of protein-coding genes, while data in Figure 2 appears contradictory. Then, experiments in Figure 4 show a global effect of INST11 depletion on promoter-proximal sense transcription. In my opinion, data from the 2.5h time-point of depletion should be shown alongside 1.5h in Figure 2 so that it is clear that the authors found an effect similar to the above references. I find the current presentation somehow misleading.

      We are grateful for this suggestion and present new analyses demonstrating that our experiment in Figure 2 concurs with previous findings (Supplemental Figures 2A and B). Our original heatmap (Figure 2E) shows a very strong and general antisense effect of INTS11 loss. On the same scale, the effects in the sense direction are not as apparent, which is also the case using metaplots.  New supplemental figure 2A now shows sense transcription from this experiment in isolation and on a lower scale, demonstrating that a subset of genes shows promoter-proximal increases in transcription following INTS11 depletion.  This is smaller and less general than the antisense effect but consistent with previous findings.  Indeed, our new analysis in supplemental figure 2B shows that affected protein-coding genes are lowly expressed, in line with Hu et al., Mol Cell 2023. This explains why a sense effect is not as apparent by metaplot, for which highly expressed genes contribute the most signal.

      As a result of our analyses, we are confident that the apparently larger effect at the 2.5hr timepoint (Figure 4) that we initially reported is due to experimental variability and not greater effects of extended INTS11 depletion. Overlaying the 1.5h and 2.5h datasets (Supplemental Figure 4B) revealed a similar number of affected protein-coding genes with a strong (83%) overlap between the affected genes.  To support this, we performed qPCR on four affected protein-coding transcripts which revealed no significant difference in the level of INTS11 effect after 2.5h vs 1.5h (Supplemental Figure 4C).

      We now present data for merged replicates in Figures 2 and 4 which reveal very similar average profiles for -INTS11 vs +INTS11 at both timepoints. Overall, we believe that we have resolved this discrepancy by showing that it amounts to experimental variability and because the most acutely affected protein-coding genes are lowly expressed. As detailed above, we show this in multiple ways (and validate by qPCR) We have revised the text accordingly and removed our original speculation that differences reflected the timeframe of INTS11 loss.

      Conclusion/assessment:

      This important work substantially advances our understanding of the mechanisms governing the directionality of human promoters. The evidence supporting the claims of the authors is compelling, with among others the use of advanced nascent transcriptomics including spike-in normalization controls and acute protein depletion using degron approaches.

      In my opinion, the authors' conclusions are in general well supported.

      Not only the manuscript but also the data generated will be useful to the wide community of researchers studying transcriptional regulation. Also, the POINT-derived novel sPOINT method described here is very valuable and can positively impact work in the field.

      We are grateful for the reviewers' positive assessment of our study.

      Reviewer #2 (Public Review):

      Summary:

      Eaton and colleagues use targeted protein degradation coupled with nascent transcription mapping to highlight a role for the integrator component INST11 in terminating antisense transcription. They find that upon inhibition of CDK9, INST11 can terminate both antisense and sense transcription - leading to a model whereby INST11 can terminate antisense transcription and the activity of CDK9 protects sense transcription from INST11-mediated termination. They further develop a new method called sPOINT which selectively amplifies nascent 5' capped RNAs and find that transcription initiation is more efficient in the sense direction than in the antisense direction. This is an excellent paper that uses elegant experimental design and innovative technologies to uncover a novel regulatory step in the control of transcriptional directionality.

      Strengths:

      One of the major strengths of this work is that the authors endogenously tag two of their proteins of interest - RBBP6 and INST11. This tag allows them to rapidly degrade these proteins - increasing the likelihood that any effects they see are primary effects of protein depletion rather than secondary effects. Another strength of this work is that the authors immunoprecipitate RNAPII and sequence extracted full-length RNA (POINT-seq) allowing them to map nascent transcription. A technical advance from this work is the development of sPOINT which allows the selective amplification of 5' capped RNAs < 150 nucleotides, allowing the direction of transcription initiation to be resolved.

      We appreciate this positive assessment.

      Weaknesses:

      While the authors provide strong evidence that INST11 and CDK9 play important roles in determining promoter directionality, their data suggests that when INST11 is degraded and CDK9 is inhibited there remains a bias in favour of sense transcription (Figures 4B and C). This suggests that there are other unknown factors that promote sense transcription over antisense transcription and future work could look to identify these.

      We agree that other (so far, unknown) factors promote sense transcription over antisense, which was demonstrated by our short POINT.  We have provided an expanded discussion on this in the revision. In our opinion, demonstrating that sense transcription is driven by preferential initiation in that direction is a key finding and we agree that the identification of the underlying mechanism constitutes an interesting avenue for future study.

      Reviewer #3 (Public Review):

      Summary:

      Using a protein degradation approach, Eaton et al show that INST11 can terminate the sense and anti-sense transcription but higher activity of CDK9 in the sense direction protects it from INS11-dependent termination. They developed sPOINT-seq that detects nascent 5'-capped RNA. The technique allowed them to reveal robust transcription initiation of sense-RNA as compared to anti-sense.

      Strengths:

      The strength of the paper is the acute degradation of proteins, eliminating the off-target effects. Further, the paper uses elegant approaches such as POINT and sPOINT-seq to measure nascent RNA and 5'-capped short RNA. Together, the combination of these three allowed the authors to make clean interpretations of data.

      We appreciate this positive assessment.

      Weaknesses:

      While the manuscript is well written, the details on the panel are not sufficient. The methods could be elaborated to aid understanding. Additional discussion on how the authors' findings contradict the existing model of anti-sense transcription termination should be added.

      We have added more detail to the figure panels, which we hope will help readers to navigate the paper more easily. Specifically, the assay employed for each experiment is indicated in each figure panel. As requested, we provide a new and separate discussion section in the revision.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on this important piece of work!

      Some specific suggestions.

      MAJOR

      -The data are not available (Accession "GSE243266" is currently private and is scheduled to be released on Sep 01, 2026.) This should be corrected and as a minimum, the raw sequencing files as well as the spike-in scaled bigwig files should be provided in GEO.

      We have made the data public. Raw and bigwig files are provided as part of the GEO upload.

      MINOR

      - It would be useful for readers if you could include catalog numbers of the reagents used in the study.

      We have included this information in our revision.

      - A table in experimental procedures summarizing the genomic experiments performed in this study as well as published ones reanalyzed here would be helpful.

      This is now provided as part of the resources table.

      - It would be easier for reviewers to evaluate the manuscript if the figure legends were included together with the figures on one page. This is now allowed by most journals.

      We have used this formatting in the revision.

      - Providing some captions for the results sections would be helpful.

      We have included subheadings as suggested.

      Reviewer #2 (Recommendations For The Authors):

      Generally, I would suggest writing the experiment-type above panels where it is not immediately obvious what they are so a reader can appreciate the figures without referencing the legend. E.g. write POINT-seq on Figure 1B just to make it obvious to someone looking at the figures what methodology they are looking at. Likewise, you could write RNAPII ChIP-seq for Supplementary Figures 3D and 3E.

      We have carried out this recommendation.

      Can a y-axis be indicated on POINT-seq genome browser tracks? This could make them easier to interpret.

      Y-axis scales are provided as RPKM as stated in the figure legends.

      The authors could address/speculate in the text why there is less POINT-seq signal for the antisense transcript in the treatment condition in Figure 1B? Or could consider including a different example locus where this is not the case for clarity.

      Acute depletion of poly(A) factors (like RBBP6) results in a strong read-through beyond the poly(A) signal of protein-coding genes as Figure 1 shows.  However, it also causes a reduction in transcription levels, which can be seen in the figure and is correctly noted by the reviewer in this comment.  We see this with other poly(A) factor depletions (e.g. CPSF73 and CPSF30 – Eaton et al., 2020 and Estell et al., 2021) and other labs have observed this too (e.g for CPSF73-dTAG depletion (Cugusi et al., Mol Cell 2022)).  Plausible reasons include a limited pool of free RNAPII due to impaired transcriptional termination or limited nucleotide availability due to their incorporation within long read-through transcripts. For these reasons, we have retained the example in Figure 1B as a typical representation of the effect. Moreover, the heatmap in Figure 1D fairly represents the spectrum of effects following RBBP6 loss – highlighting the strong read-through beyond poly(A) signals and the marginal antisense effects.

      "The established effect of INTS11 at snRNAs was detected in our POINT-seq data and demonstrates the efficacy of this approach (Figure 2B)." The authors could explain this point more clearly in the text and describe the data - e.g. As expected, depletion of INTS11 leads to increased POINT-seq signal at the 3' end of snRNAs, consistent with defects in transcriptional termination. This is highlighted by the RNU5A-1 and RNU5B-1 loci (Figure 2B).

      We agree and have added more context to clarify this.

      I would suggest adjusting the scale of the heatmap in Figure 2E - I think it would be easier to interpret if the value of 0 was white - with >0 a gradient of orange and <0 a gradient of blue (as is done in Figure 1C). I think making this change would make the point as written in the text clearer i.e. "heatmap analysis demonstrates the dominant impact of INTS11 on antisense versus sense transcription at most promoters (Figure 2E)." I'm assuming most of the sense transcription would be white (more clearly unchanging) when the scale is adjusted.

      We agree and have done this. The reviewer is correct that most sense transcription is unchanged by INTS11 loss.  However, as we alluded to in the original submission, a subset of transcripts shows a promoter-proximal increase after INTS11 depletion. We have expanded the analyses of this effect (see responses to other comments) but stress that it is neither as general nor as large as the antisense effect.

      The authors make the point that there is mildly increased transcription over the 5' end of some genes upon INST11 depletion and show a track (Supplementary Fig 2A). It is not immediately obvious from the presentation of the meta-analysis in Figure 2D how generalisable this statement is. Perhaps the size of the panel or thickness of the lines in Figure 2D could be adjusted so that the peak of the control (in blue) could be seen. Perhaps an arrow indicating the peak could be added? I'm assuming the peak at the TSS is slightly lower in the control compared to INST11 depletion based on the authors' statement.

      We have provided multiple new analyses of this data to highlight where there are promoter-proximal effects of INTS11 loss in the sense direction.  Please see our response to the public review of reviewer 1 and new supplemental figures 2A, 2B, 4A and 4B which highlight the sense transcription increased in the absence of INTS11.

      The authors label Figure 4 "Promoters lose their directionality when CDK9 is inhibited" - but in INST11 depleted cells treated with CDK9i they find that there still is a bias towards sense transcription. Suggested edit "Some promoter directionality is lost when CDK9 is inhibited" or similar.

      We agree and have made this change.

      The authors conclude that INTS11-mediated effects are the result of perturbation of the catalytic activities of Integrator, the authors should perform rescue experiments with the catalytically dead E203Q-INTS11 mutant.

      This is a very good suggestion and something we had intended to pursue.  However, as we will describe below (and shown in Supplemental Figure 4G), there were confounding issues with this experiment.

      The E203Q mutant of INTS11 is widely used in the literature to test for catalytic functions of INTS11.  However, we have found that this mutation impairs the ability of INTS11 to bind other Integrator modules in cells. Based on co-immunoprecipitation of flag-tagged WT and E203Q derivatives, INTS1 (backbone module), 10 (tail module), and 8 (phosphatase module) all show reduced binding to E203Q vs. WT. Because E203Q INTS11 is defective in forming Integrator complexes, rescue experiments might not fully distinguish the effects of INTS11 activity from those caused by defects in complex assembly. While this may at first seem unexpected, in the analogous 3’ end processing complex, catalytic mutants of CPSF73 (which is highly related to INTS11) negatively affect its interaction with other complex members (Kolev and Steitz, EMBO Reports 2005).

      We hypothesise that INTS11 activity is most likely involved in attenuating promoter-proximal transcription, but we cannot formally rule out other explanations and discuss this in our revision. Regardless of how INTS11 attenuates transcription, our main conclusion is on its requirement to terminate antisense transcription whether this involves its cleavage activity or not.

      The authors suggest that CDK9 modulates INTS11 activity/assembly and suggest this may be related to SPT5. Is there an effect of CDK9 inhibition on the snRNA's highlighted in Figure 2B?

      We believe that snRNAs are different from protein-coding genes concerning CDK9 function. Shona Murphy’s lab previously showed that, unlike protein-coding genes, snRNA transcription is insensitive to CDK9 inhibition, and that snRNA processing is impaired by CDK9 inhibition (Medlin et al., EMBO 2003 and EMBO 2005).  We reproduce these findings by metaanalysis of 15 highly expressed and well-separated snRNAs and by qRT-PCR of unprocessed RNU1-1, RNU5A-1 and RNU7-1 snRNA following CDK9 inhibition. We observe snRNA read-through by POINT-seq following INTS11 loss whether CDK9 is inhibited or not (left panel, below). Note the higher TES proximal signal in CDK9i conditions, which likely reflects the accumulation of unprocessed snRNA as validated by qPCR for three example snRNAs (right panel, below).

      Author response image 1.

      For Figure 4, would similar results be observed using inhibitors targeting other transcriptional CDKs such as CDK7,12/13?

      In response to this suggestion, we analysed four selected protein-coding transcripts (the same 4 that we used to validate the CDK9i results) by qRT-PCR in a background of CDK7 inhibition using the THZ2 compound (new Supplemental Figure 4E).  THZ2 suppresses transcription from these genes as expected.  Interestingly, expression is restored by co-depleting Integrator, recapitulating our findings with CDK9 inhibition.  As CDK7 is the CDK-activating kinase for CDK9, its inhibition will also inhibit CDK9 so THZ2 may simply hit this pathway upstream of where CDK9 inhibitors.  Second, CDK7 may independently shield transcription from INTS11.  We allude to both interesting possibilities.

      What happens to the phosphorylation state of anti-sense engaged RNAPII when INTS11 is acutely depleted and/or CDK9 is inhibited? This could be measured by including Ser5 and Ser2 antibodies in the sPOINT-seq assay and complemented with Western Blot analysis.

      We have performed the western blot for Ser5 and Ser2 phosphorylation as suggested.  Both signals are mildly enhanced by INTS11 loss, which is consistent with generally increased transcription.  Ser2p is strongly reduced by CDK9 inhibition, which is consistent with the loss of nascent transcription in this condition.  Interestingly, both modifications are partly recovered when INTS11 is depleted in conjunction with CDK9 inhibition. This is consistent with the effects that we see on POINT-seq and shows that the recovered transcription is associated with some phosphorylation of RNAPII CTD.  This presumably reflects the action(s) of kinases that can act redundantly with CDK9.

      We have not performed POINT-seq with Ser5p and Ser2p antibodies under these various conditions.  Our rationale is that our existing data uses an antibody that captures all RNAPII (regardless of its phosphorylation status), which we feel most comprehensively assays transcription in either direction. Moreover, the lab of Fei Chen (Hu et al., Mol Cell 2023) recently published Ser5p and Ser2p ChIP-seq following INTS11 loss. By ChIP-seq, they observe a bigger increase in antisense RNAPII occupancy vs. sense providing independent and orthogonal support for our POINT-seq data.  Interestingly, this antisense increase is not paralleled by proportional increases in Ser5p or Ser2p signals.  This suggests that the unattenuated antisense transcription resulting from INTS11 loss does not have high Ser5p or Ser2p.  Since CDK7 and 9 are major Ser5 and 2 kinases, this supports our model that their activity is less prevalent for antisense transcription.  We now discuss these data in our revision.   

      The HIV reporter RNA experiments should be performed with the CDK9 inhibitor added to the experimental conditions. Presumably CDK9 inhibition would result in no upregulation of the reporter upon addition of TAT and/or dTAG. Perhaps the amount of TAT should be reduced to still have a dynamic window in which changes can be detected. It is possible that reporter activation is simply at a maximum. Can anti-sense transcription be measured from the reporter?

      We have performed the requested CDK9 inhibitor experiment to confirm that TAT-activated transcription from the HIV promoter is CDK9-dependent (new supplemental figure 4F).  Consistent with previous literature on HIV transcription, CDK9 inhibition attenuates TAT-activated transcription.  Importantly, and in line with our other experiments, depletion of INTS11 results in significant restoration of transcription from the HIV promoter when CDK9 is inhibited. Thus, TAT-activated transcription is CDK9-dependent and, as for endogenous genes, CDK9 prevents attenuation by INTS11.

      While TAT-activated transcription is high, we do not think that the plasmid is saturated. When considering this question, we revisited previous experiments using this system to study RNA processing (Dye et al., Mol Cell 1999, Cell 2001, Mol Cell 2006). In these cases, mutations in splice sites or polyadenylation sites have a strong effect on RNA processing and transcription around HIV reporter plasmids. Effects on transcription and RNA processing are; therefore, apparent in the appropriate context. In contrast, we find that the complete elimination of INTS11 has no impact on RNA output from the HIV reporter. Our original experiment assessing the impact of INTS11 loss in +TAT conditions used total RNA.  One possibility is that this allows non-nascent RNA to accumulate which might confound our interpretation of INTS11 effects on ongoing transcription.  However, the new experiment described in the paragraph above was performed on chromatin-associated (nascent) RNA to rule this out.  This again shows no impact of INTS11 loss on HIV promoter-derived transcription in the presence of TAT.

      To our knowledge, antisense transcription is not routinely assayed from plasmids. They generally employ very strong promoters (e.g. CMV, HIV) to drive sense transcription.  Crucially, their circular nature means that RNAPII going around the plasmid could interfere with antisense transcription coming the other way which does not happen in a linear genomic context. This is why we restricted our use of plasmids to looking at the effects of stimulated CDK9 recruitment (via TAT) on transcription rather than promoter directionality.   

      The authors should clearly state how many replicates were performed for the genomics experiments. Ideally, a signal should be quantified and compared statistically rather than relying on average profiles only.

      We have stated the replicate numbers for sequencing experiments in the relevant figure legends. All sequencing experiments were performed in at least two biological replicates, but often three. In addition, we validated their key conclusions by qPCR or with orthogonal sequencing approaches.

      Reviewer #3 (Recommendations For The Authors):

      The authors provide strong evidence in support of their claims.

      ChIP-seq of pol2S5 and S2 upon INST11 and CDK9 inhibition will strengthen the observation that transcription in the sense direction is more efficient.

      We view the analysis of total RNAPII as the most unbiased way of establishing how much RNAPII is going one way or the other. Importantly, ChIP-seq was very recently performed for Ser2p and Ser5p RNAPII derivatives in the lab of Fei Chen (Hu et al., Mol Cell 2023). Their data shows that loss of INTS11 increases the occupancy of total RNAPII in the antisense direction more than in the sense direction, which is consistent with our finding. Interestingly, the increased antisense RNAPII was not paralleled with an increase in Ser2p or Ser5p. This suggests that, following INTS11 loss, the unattenuated antisense transcription is not associated with full/normal Ser2p or Ser5p. These modifications are normally established by CDK7 and 9; therefore, this published ChIP-seq suggests that they are not fully active on antisense transcription when INTS11 is lost. This supports our overall model that CDK9 (and potentially CDK7 as suggested for a small number of genes in new Supplemental Figure 4E) is more active in the sense direction to prevent INTS11-dependent attenuation. We now discuss these data in our revision.

      In Supplementary Figure 2, the eRNA expression increases upon INST11 degradation, I wonder if the effects of this will be appreciated on cognate promoters? Can the authors test some enhancer:promoter pairs?

      We noticed that some genes (e.g. MYC) that are regulated by enhancers show reduced transcription in the absence of INTS11. Whilst this could suggest a correlation, the transcription of other genes (e.g. ACTB and GAPDH) is also reduced by INTS11 loss although they are not regulated by enhancers.  A detailed and extensive analysis would be required to establish any link between INTS11-regulated enhancer transcription and the transcription of genes from their cognate promoters.  We agree that this would be interesting, but it seems beyond the scope of our short report on promoter directionality.

      Line 111, meta plot was done of 1316 genes. Details on this number should be provided. Overall, the details of methods and analysis need improvement. The layout of panels and labelling on graphs can be improved.

      We have now explained the 1316 gene set.  In essence, these are the genes separated from an expressed neighbour by at least 10kb.  This distance was selected because depletion of RBBP6 induces extensive read-through transcription beyond the polyadenylation site of protein-coding genes.  To avoid including genes affected by transcriptional read-through from nearby transcription units we selected those with a 10kb gap between them. This was the only selection criteria so is unlikely to induce any unintended biases. Finally, we have added more information to the figure panels and their legends, which we hope will make our manuscript more accessible.

    1. Author response:

      We thank the reviewers for their positive evaluation and constructive feedback on our study.

      We acknowledge the concern regarding the use of HEK293T cells. In the revised manuscript, we will provide a more detailed explanation of the role of the PKA pathway in the regulation of GSIS by PGE2. To validate this regulation through Kv2.2, we will overexpress the Kv2.2 mutant channel in beta cells and assess its impact. Additionally, we will verify the specificity of the antibodies for EP1-EP4 receptors by knockdown. To confirm the receptors involved in PGE2 function, we will use additional EP receptor blockers or perform receptor knockdown experiments.

      We will clarify that the described signaling pathway operates under normal physiological conditions and differs from pathological changes.

      We once again thank the reviewers for their positive evaluation and constructive suggestions.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1

      We modified the text regarding PRC1 according to the reviewer’s recommendation.

      Reviewer #2

      Following the reveiwer’s advise, we introduced the holdup assay, as well as the native holdup assay in more details.

      This new part now also discusses the question of replicates in more details. We do not agree with the eLife assessment on this matter, but we think that this assessment was made because analyzing holdup data requires a different approach compared to more conventional interactomic approaches and these differences were not introduced in sufficient depth. We hope that the inclusion of more background reasoning, as well as by providing a more detailed comparison of the measured independent BIN1 interactomes, now included on Figure S4, will eliminate all confusion in the reader.

      We thank the reviewer for guiding us to a previous work that was done on Grb2. Indeed, the finding of this earlier work aligns perfectly with our finding suggesting general similarities in SH3 domain mediated interactions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Benner et al. identify OVO as a transcriptional factor instrumental in promoting the expression of hundreds of genes essential for female germline identity and early embryo development. Prior data had identified both ovo and otu as genes activated by OVO binding to the promoters. By combining ChIP-seq, RNA-seq, and analysis of prior datasets, the authors extend these data to hundreds of genes and therefore propose that OVO is a master transcriptional regulator of oocyte development. They further speculate that OVO may function to promote chromatin accessibility to facilitate germline gene expression. Overall, the data compellingly demonstrate a much broader role for OVO in the activation of genes in the female germline than previously recognized. By contrast, the relationship between OVO, chromatin accessibility, and the timing of gene expression is only correlative, and more work will be needed to determine the mechanisms by which OVO promotes transcription.

      We fully agree with this summary.  

      Strengths:

      Here Benner et al. convincingly show that OVO is a transcriptional activator that promotes expression of hundreds of genes in the female germline. The ChIP-seq and RNA-seq data included in the manuscript are robust and the analysis is compelling.

      Importantly, the set of genes identified is essential for maternal processes, including egg production and patterning of the early embryo. Together, these data identify OVO as a major transcriptional activator of the numerous genes expressed in the female germline, deposited into the oocyte and required for early gene expression. This is an important finding as this is an essential process for development and prior to this study, the major drivers of this gene expression program were unknown.

      We are delighted that this aspect of the work came across clearly. Understanding the regulation of maternal effect genes has been something of a black-box, despite the importance of this class of genes in the history of developmental genetics. The repertoire of essential oogenesis/embryonic development genes that are bound by and respond to OVO are well characterized in the literature, but nothing is known about how they are transcriptionally regulated. We feel the manuscript will be of great interest to readers working on these genes.

      Weaknesses:

      The novelty of the manuscript is somewhat limited as the authors show that, like two prior, well-studied OVO target genes, OVO binds to promoters of germline genes and activates transcription. The fact that OVO performs this function more broadly is not particularly surprising.

      Clearly, transcription factors regulate more than one or two genes. Never-the-less we were surprised at how many of the aspects of oogenesis per se and maternal effect genes were OVO targets. It was our hypothesis that OVO would have a transcriptional effect genome-wide, however, it was less clear whether OVO would always bind at the core promoter, as is with the case of ovo and otu. Our results strongly support the idea that core promoter proximal binding is essential for OVO function; a conclusion of work done decades ago, which has not been revisited using modern techniques. 

      A major challenge to understanding the impact of this manuscript is the fact that the experimental system for the RNA-seq, the tagged constructs, and the expression analysis that provides the rationale for the proposed pioneering function of OVO are all included in a separate manuscript.

      This is a case where we ended up with a very, very long manuscript which included a lot of revisiting of legacy data. It was a tough decision on how to break up all the work we had completed on ovo to date. In our opinion, it was too much to put everything into a single manuscript unless we wanted a manuscript length supplement (we were also worried that supplemental data is often overlooked and sometimes poorly reviewed). We therefore decided to split the work into a developmental localization/characterization paper and a functional genomics paper. As it stands both papers are long. Certainly, readers of this manuscript will benefit from reading our previous OVO paper, which we submitted before this one. The earlier manuscript is under revision at another journal and we hope that this improved manuscript will be published and accessible shortly.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Benner et al. interrogate the transcriptional regulator OVO to identify its targets in the Drosophila germline. The authors perform ChIP-seq in the adult ovary and identify established as well as novel OVO binding motifs in potential transcriptional targets of OVO. Through additional bioinformatic analysis of existing ATAC-seq, CAGE-seq, and histone methylation data, the authors confirm previous reports that OVO is enriched at transcription start sites and suggest that OVO does not act as part of the core RNA polymerase complex. Benner et al. then perform bulk RNA-seq in OVO mutant and "wildtype" (GAL4 mediated expression of OVO under the control of the ovo promoter in OVO mutants) ovaries to identify genes that are differentially expressed in the presence of OVO. This analysis supports previous reports that OVO likely acts at transcription start sites as a transcriptional activator. While the authors propose that OVO activates the expression of genes that are important for egg integrity, maturation, and for embryonic development (nanos, gcl, pgc, bicoid), this hypothesis is based on correlation and is not supported by in vivo analysis of the respective OVO binding sites in some of the key genes. A temporal resolution for OVO's role during germline development and egg chamber maturation in the ovary is also missing. Together, this manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis but lacks important in vivo experimental evidence that would validate the high-quality datasets.

      We thank reviewer 2 for the appreciation of the genomics data and analysis. Some of the suggested in vivo experiments are clear next steps, which are well underway. These are beyond the scope of the current manuscript. 

      Temporal analysis of ovo function in egg chamber development is not easy, as only the weakest ovo alleles have any egg chambers to examine. However, we will also point out the long-known phenotypes of some of those weak alleles in the text (e.g. ventralized chambers in ovoD3/+). We will need better tools for precise rescue/degradation during egg chamber maturation.     

      Strengths:

      The manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis

      Thank you. We went to great lengths to do our highly replicated experiments in multiple ways (e.g. independent pull-down tags) and spent considerable time coming up with an optimized and robust informatic analysis.

      Weaknesses:

      (1) The authors propose that OVO acts as a positive regulator of essential germline genes, such as those necessary for egg integrity/maturation and embryonic/germline development. Much of this hypothesis is based on GO term analysis (and supported by the authors' ChIP-seq data). However accurate interpretation of GO term enrichment is highly dependent on using the correct background gene set. What control gene set did the authors use to perform GO term analysis (the information was not in the materials and methods)? If a background gene set was not previously specified, it is essential to perform the analysis with the appropriate background gene set. For this analysis, the total set of genes that were identified in the authors' RNA-seq of OVO-positive ovaries would be an ideal control gene set for which to perform GO term analysis. Alternatively, the total set of genes identified in previous scRNA-seq analysis of ovaries (see Rust et al., 2020, Slaidina et al., 2021 among others) would also be an appropriate control gene set for which to perform GO term analysis. If indeed GO term analysis of the genes bound by OVO compared to all genes expressed in the ovary still produces an enrichment of genes essential for embryonic development and egg integrity, then this hypothesis can be considered.

      We feel that this work on OVO as a positive regulator of genes like bcd, osk, nos, png, gnu, plu, etc., is closer to a demonstration than a proposition. These are textbook examples of genes required for egg and early embryonic development. Hopefully, this is not lost on the readers by an over-reliance on GO term analysis, which is required but not always useful in genome-wide studies. 

      We used GO term enrichment analysis as a tool to help focus the story on some major pathways that OVO is regulating. To the specific criticism of the reference gene-set, GO term enrichment analysis in this work is robust to gene background set. We will update the GO term enrichment analysis text to indicate this fact and add a table using expressed genes in our RNA-seq dataset to the manuscript and clarify gene set robustness in greater detail in the methods of the revision. We will also try to focus the reader’s attention on the actual target genes rather than the GO terms in the revised text.

      We have updated the GO term analysis by including all the expressed genes in our RNA-seq datasets as a background control. Figure 6 has been updated to include the significant GO terms. We have outlined changes in the methods section below.

      Lines 794-801:

      “Gene ontology enrichment analysis was completed with g:Profiler’s g:GOSt software (Raudvere et al. 2019) on the set of genes overlapping OVO ChIP peaks over the TSS and significantly upregulated in the presence of ectopic OVO (525 genes in total). All genes that were considered to be expressed in our RNA-seq datasets were used as a background control (10,801 genes in total). Default parameters were used for the enrichment analysis except for ‘statistical domain scope’ was set to ‘custom’ (our control background genes were uploaded here), ‘significance threshold’ was set to ‘Bonferroni correction’, and only GO biological process terms were searched for enrichment with the gene list. The GO terms listed in Figure 6 represent the 24 smallest GO term sizes according to Table S5.”

      (2) The authors provide important bioinformatic analysis of new and existing datasets that suggest OVO binds to specific motifs in the promoter regions of certain germline genes. While the bioinformatic analysis of these data is thorough and appropriate, the authors do not perform any in vivo validation of these datasets to support their hypotheses. The authors should choose a few important potential OVO targets based on their analysis, such as gcl, nanos, or bicoid (as these genes have well-studied phenotypes in embryogenesis), and perform functional analysis of the OVO binding site in their promoter regions. This may include creating CRISPR lines that do not contain the OVO binding site in the target gene promoter, or reporter lines with and without the OVO binding site, to test if OVO binding is essential for the transcription/function of the candidate genes.

      Exploring mechanism using in vivo phenotypic assays is awesome, so this is a very good suggestion. But, it is not essential for this work -- as has been pointed out in the reviews, in vivo validation of OVO binding sites has been comprehensively done for two target genes, ovo and otu. The “rules” appear similar for both genes. That said, we are already following up specific OVO target genes and the detailed mechanism of OVO function at the core promoter. We removed some of our preliminary in vivo figures from the already long current manuscript. We continue to work on OVO and expect to include this type of analysis in a new manuscript.

      (3) The authors perform de novo motif analysis to identify novel OVO binding motifs in their ChIP-seq dataset. Motif analysis can be significantly strengthened by comparing DNA sequences within peaks, to sequences that are just outside of peak regions, thereby generating motifs that are specific to peak regions compared to other regions of the promoter/genome. For example, taking the 200 nt sequence on either side of an OVO peak could be used as a negative control sequence set. What control sequence set did the authors use as for their de novo motif analysis? More detail on this is necessary in the materials and methods section. Re-analysis with an appropriate negative control sequence set is suggested if not previously performed.

      We apologize for being unclear on negative sequence controls in the methods. We used shuffled OVO ChIP-seq peak sequences as the background for the de novo motif analysis, which we will better outline in the methods of the revision. This is a superior background set of sequences as it exactly balances GC content in the query and background sequences. We are not fond of the idea of using adjacent DNA that won’t be controlled for GC content and shadow motifs. Furthermore, the de novo OVO DNA binding motifs are clear, statistically significant variants of the characterized in vitro OVO DNA binding motifs previously identified (Lu et al., 1998; Lee and Garfinkel, 2000; Bielinska et al., 2005), which lends considerable confidence. We also show that the OVO ChIP-seq read density are highly enriched for all our identified motifs, as well as the in vitro motifs. We provide multiple lines of evidence, through multiple methods, that the core OVO DNA binding motif is 5’-TAACNGT-3’. We have high confidence in the motif data.

      We have added the below text to the methods section for further clarity on motif analysis parameters.

      Lines 808-812

      “The default parameters were used for de novo motif enrichment analysis, including the use of shuffled input sequences as a control. After identifying ‘OVO Motif One’, OVO ChIP peaks that contained that sequence were removed and the resulting ChIP peaks were resubmitted for STREME analysis deriving derivative OVO DNA binding motifs like above.”

      (4) The authors mention that OVO binding (based on their ChIP-seq data) is highly associated with increased gene expression (lines 433-434). How many of the 3,094 peaks (conservative OVO binding sites), and what percentage of those peaks, are associated with a significant increase in gene expression from the RNA-seq data? How many are associated with a decrease in gene expression? This information should be added to the results section.

      Not including the numbers of the overlapping ChIP peaks and expression changes in the text was an oversight on our part. The numbers that relate to this (666 peaks overlapping genes that significantly increased in expression, significant enrichment according to Fishers exact test, 564 peaks overlapping genes that significantly decreased in expression, significant depletion according to Fishers exact test) are found in figure 4C and will be added to the text.

      We have modified the results section to include the overlap between the RNA-seq and ChIP-seq data.

      Lines 463-468

      “We found that 2,298 genes that were expressed in our RNA-seq data overlapped an OVO ChIP peak. 666 genes significantly increased in expression and were bound by OVO, which is a significant enrichment according to a Fisher’s exact test (Figure 4C, cyan dots, p < 0.01, odds ratio = 2.21). While conversely, 564 genes decreased in expression and were bound by OVO, indicating a significant depletion according to a Fisher’s exact test (Figure 4C, blue dots, p < 0.01, odds ratio = 0.85).”

      (5) The authors mention that a change in endogenous OVO expression cannot be determined from the RNA-seq data due to the expression of the OVO-B cDNA rescue construct. Can the authors see a change in endogenous OVO expression based on the presence/absence of OVO introns in their RNA-seq dataset? While intronic sequences are relatively rare in RNA-seq, even a 0.1% capture rate of intronic sequence is likely to be enough to determine the change in endogenous OVO expression in the rescue construct compared to the OVO null.

      This is a good point. The GAL4 transcript is downstream of ovo expression in the hypomorphic ovoovo-GAL4 allele. We state in the text that there is a nonsignificant increase in GAL4 expression with ectopic rescue OVO, although the trend is positive. We calculated the RPKM of RNA-seq reads mapping to the intron spanning exon 3 and exon 4 in ovo-RA and found that there is also a nonsignificant increase in intronic RPKM with ectopic rescue OVO (we will add to the results in the revision). We would expect OVO to be autoregulatory and potentially increase the expression of GAL4 and/or intronic reads, but the ovoovoGAL4>UASp-OVOB is not directly autoregulatory like the endogenous locus. It is not clear to us how the intervening GAL4 activity would affect OVOB activity in the artificial circuit. Dampening? Feed-forward? Is there an effect on OVOA activity? Regardless, this result does not change our interpretation of the other OVO target genes.

      We have added the analysis of intronic ovo RNA-seq to the results as outlined below.

      Lines 512-520

      “Transcriptionally, ovo RNA-seq reads are likely derived from the UASp-3xFHA-OVO-B cDNA rescue or are indistinguishable between the genomic locus and rescuing cDNA transgene. We found a nonsignificant increase in exon 3 to exon 4 intronic ovo reads with the expression of ectopic rescue OVO (log2 fold change = 0.76, p-adj = 0.26). These intronic reads would be derived from the endogenous ovo locus, but it is difficult to conclusively determine if the endogenous ovo locus would respond transcriptionally to ectopic OVO downstream of UASp (for example, the pathway for ovo is no longer autoregulatory in ovoovo-GAL4/ovoΔBP; UASp-3xFHA-OVO-B germ cells, there is an additional GAL4>UASp activation step). So, we could not confidently assess whether ovo responded transcriptionally to ectopic rescue OVO.”

      (6) The authors conclude with a model of how OVO may participate in the activation of transcription in embryonic pole cells. However, the authors did not carry out any experiments with pole cells that would support/test such a model. It may be more useful to end with a model that describes OVO's role in oogenesis, which is the experimental focus of the manuscript.

      We did not complete any experiments in embryonic pole cells in this manuscript and base our discussion on the potential dynamics of OVO transcriptional control and our previous work showing maternal and zygotic OVO protein localization in the developing embryonic germline. Obviously, we are highly interested in this question and continue to work on the role of maternal OVO. We agree that we are extended too far and will remove the embryonic germ cell model in the figure. We will instead focus on the possible mechanisms of OVO gene regulation in light of the evidence we have shown in the adult ovary, as suggested.

      We have removed figure 7 and have re-written the last two paragraphs of the discussion as below.

      Lines 645-663

      “The requirement for OVO at the TSS of target genes has been well characterized at its own locus as well as its downstream target otu. Our OVO ChIP and expression data confirm findings from previous work that OVO is binding to these target promoters, and in the case of otu, strongly responds transcriptionally to the presence of OVO. Although we did not test the requirement for OVO DNA binding motifs at other OVO bound genes in this work, this has been extensively explored before, showing that removal of OVO

      DNA binding sites overlapping the TSS results in a strong decrease in reporter expression (Lü et al. 1998; Bielinska et al. 2005; Lü and Oliver 2001). Removal of more distal upstream OVO DNA binding sites also reduces reporter expression to a lesser degree. However, for most cases tested, removal of OVO DNA binding sites while leaving the rest of the enhancer regions intact, never totally abolished reporter expression. These dynamics are highly similar to work that has been completed on the pioneer factor zelda (zld). Adding zld DNA binding motifs to a stochastically expressed transcriptional reporter increases the activity and response of the reporter (Dufourt et al. 2018). Distally located zld DNA binding motifs influenced reporter expression to a lesser degree than proximal sites. A single zld DNA binding site adjacent to the TSS produced the strongest reporter activity. Importantly, just like the activity of OVO transgenic reporters, there is not an absolute requirement for zld DNA binding to activate reporter expression, however, the addition of TSS adjacent zld DNA binding motifs does strongly influence reporter response. We know that zld achieves this reporter response through its pioneering activity (Xu et al. 2014; Harrison et al. 2011), whether OVO achieves this similar effect on gene expression through a shared mechanism, or in cooperation with other transcription factors needs to be further explored.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The Results section could be streamlined by limiting the discussion of analysis to only those details that are unusual or essential for understanding the science. For example, the fact that MACS3 was used to call peaks seems most suitable for the Methods section.

      We have removed the below excerpts from the results section to streamline the text.

      ‘We compared immuno-purified OVO associated DNA with input DNA as a control, for a total of 12 ChIPseq libraries, which we sequenced using the Illumina system. After quality control and alignment to the Drosophila r6.46 genome (Gramates et al. 2022), we used MACS3 (Zhang et al. 2008)’

      The Supplemental Tables are referred to out of order. Table S2 is referred to on line 143 while Table S1 is not referred to until the Methods section.

      We have reorganized the order of the tables in the manuscript text.

      In the analysis of CAGE-seq data, it is unclear whether there is anything distinctive about the ~2000 regions bound by OVO but that is not near TSS in the ovary dataset. Are these TSS that are not active in the ovary or are these non-promoter bound OVO sites? If they are TSS of genes not in the CAGE-seq data set, are these genes expressed in other tissues or just expressed at lower levels in the ovary?

      This was a good point that prompted us to take a closer look at the characteristics of OVO binding and its relationships to promoters and other gene elements. 45% of OVO ChIP peaks overlapped the TSS while 55% were either non-overlapping downstream or upstream of the TSS. When plotting OVO ChIP read density, there was still a striking enrichment of OVO binding over the TSS, even though the ChIP peak was not overlapping the TSS (new figure 1K). This is possibly due to weaker direct OVO binding at the TSS that was not considered significant in the peak calling software or were indirect interactions of the distal OVO binding and the TSS. We outline this in the below text added to the results section on the OVO ChIP. To showcase these results, we have included a new panel in figure 1K. We removed the panel showing the enrichment over the cage-seq TSS, but this same data remains in the heatmap shown in figure 1L, so no information is lost. To directly answer the Cage-seq questions considering the OVO bound over the annotated TSS results, we found that 1,047 chip peaks overlapped CAGE-seq TSS, which is only 347 fewer than the annotated TSS overlap (1,394). Of the 1,394 genes that were bound by over the TSS, all of them were considered to be expressed in our RNA-seq dataset, indicating that these might just be more lowly expressed genes that for whatever reason were not considered to be enriched TSSs in the CAGE-seq data. This difference is likely not significant.

      Lines 235-251

      “Although OVO ChIP peaks overlapping genes showed a strong read density enrichment over the TSS, we found that only 45% (1,394/3,094) of OVO ChIP peaks directly overlapped a TSS. 43% (1,339/3,094) of OVO ChIP peaks were found to overlap the gene body downstream of the TSS (intronic and exonic sequences) and 12% (366/3,094) did not overlap any gene elements, indicating that they were intergenic.

      We were interested in the differences between OVO binding directly over the TSS or at more distal upstream and downstream sites. We decided to plot the OVO ChIP read density of these different classes of OVO binding patterns and found that OVO bound over the TSS produced a sharp read density enrichment over the TSS which was consistent with what was found for all OVO bound genes (Figure 1K). OVO binding along the gene body surprisingly also showed a read density enrichment over the TSS, although the magnitude of read density enrichment was notably less than TSS OVO binding. Intergenic OVO binding also showed these same characteristics with a notable upstream read density enrichment possibly indicative of enhancer binding. This indicates that although the significantly called OVO ChIP peaks did not overlap the TSS, there was still a propensity for TSS sequences to be enriched with OVO ChIP over the input control. This could be due to weaker direct in vivo binding of OVO to these TSSs or indirect interactions between the upstream/downstream OVO bound sequences and the TSS, possibly through a looping enhancer-promoter interaction. However, regardless of the location of the OVO ChIP peak, OVO seemed to always be enriched at or in close proximity to TSSs.”

      It would be helpful for the authors to provide a bit more detailed analysis of chromatin states of OVObound regions in GSC, 8c NC, and 32c NC (or some more clarity in the current analysis). Are the regions that are bound by OVO accessible in all these cell types or specifically enriched for accessibility in a subset? The authors state that OVO binding is correlated with open chromatin, but whether these are regions that are open in all cell types analyzed or a subset is not clear from the data presented. Promoters are often accessible regardless of cell type, so it is unclear what exactly is to be concluded from this association. Also, is the proximity to open chromatin features for OVO-bound promoters (as shown in Figure 2C) different than non-OVO-bound promoters (the two classes shown Figure 1L, for example)?

      We utilized previously published datasets of staged germ cell chromatin status to look at the association of chromatin status and OVO binding. Unfortunately, not all the same germ cell stages were profiled for each chromatin mark from the datasets derived for these two papers. For example, only H3K4me3 data exists for GSCs, and only gsc and 8c data exists for H3K9me3, while the other chromatin marks had more profiles, even including later stages. We focused specifically on gsc and 32c (essentially stage 5 egg chambers) for the other chromatin marks since that is when the ovo hypomorphic egg chambers arrest. A nice control would have been chromatin states in somatic follicle cells of the ovary, since we know germ cell genes such as ovo and otu are not expressed and presumably the chromatin states in somatic cell types would be different than germ cells. However, chromatin states for somatic follicle cells were not published in these two papers and we are not aware of any other existing datasets to compare too. Essentially, we need to determine the changes in chromatin states with and without OVO, which we are currently working on. 

      We did further analyze chromatin states and differential OVO binding in respect to gene elements, and found that OVO binding, regardless of the relationship to the gene element, is always open (gsc and 32c ATAC). OVO binding over the gene body shows the same enrichment for open chromatin and transcriptionally active histone marks. We compared the profiles of these chromatin marks and the promoters of OVO bound and not bound genes and consistent with the suggestion that promoters are generally open, we found that this was the case. However, there is an enrichment for open chromatin and transcriptionally active histone marks for OVO bound genes compared to non-OVO bound genes. This could be a consequence of OVO binding or indirect consequence of a downstream OVO target. Regardless, as has been suggested, future experiments directly measuring chromatin status and OVO needs to be performed. The below excerpts have been added to the text to supplement the comments provided above.

      Lines 328-343

      “The association of OVO binding with active histone marks and open chromatin was striking, but open chromatin is likely a general phenomenon of promoters (Haines and Eisen, 2008). Indeed, when measuring the read density for GSC and 32C ATAC-seq for OVO bound and OVO non-bound promoters, there is an enrichment for open chromatin at the TSS regardless of OVO binding. However, we did notice an increase in enrichment for OVO bound promoters compared to OVO non-bound promoters (Figure S1G), possibly suggesting that OVO bound promoters are more open or have an increase in accessibility when compared to non-OVO bound promoters. This same relationship held true for the transcriptionally active histone mark H3K27ac in GSCs (Figure S1H). Since only 45% of OVO ChIP peaks overlapped TSSs, we plotted the read density of the above chromatin marks over OVO ChIP peak maximums for OVO bound over the TSS, gene body, or intergenic regions (Figure S2A-D). We found that OVO bound regions that were not overlapping the TSS still showed the same propensity for enrichment of open chromatin and active histone marks. Intergenic regions were especially enriched for open chromatin measured through ATAC-seq. Altogether suggesting that OVO binding genome-wide is tightly associated with open chromatin regardless of germ cell stage, and active transcription in GSCs. In other words, chromatin state data suggests OVO is acting positively on its target genes and raises the possibility that OVO-binding and open chromatin are related.”

      For clarity, it would help the reader if the authors mentioned the male-specific TATA-associated factors as a rationale for testing the role of OVO binding in core promoter function. This is currently mentioned in the Discussion on lines 575-577, but would help in understanding the motivation behind the detailed analysis of the promoter binding of OVO in the Results and make the negative result more clearly impactful.

      We have introduced the male specific tata factors as suggested and have condensed the two intro paragraphs in this section into one, as shown below.

      Lines 347-363

      “Our data thus far clearly indicates that OVO binding occurs at or very near the core promoter, a region recognized by an enormous collection of factors that associate with RNA polymerase to initiate transcription (Aoyagi and Wassarman 2000; Vo Ngoc, Kassavetis, and Kadonaga 2019). The highly organized polymerase complex has sequence-specific DNA recognition sites with incredibly precise spacing between them, with an overall DNA footprint of a little less than 100bp (Rice, Chamberlin, and Kane 1993; FitzGerald et al. 2006; Ohler et al. 2002). There are upstream binding sites such as TATA, sites at transcription start, such as the initiator (INR), and downstream promoter elements (DPE) (Vo Ngoc, Kassavetis, and Kadonaga 2019). The combinations of these DNA motifs is not random in mammals and Drosophila (FitzGerald et al. 2006), and distinct combinations of different motifs at the TSS of genes expressed in Drosophila are conserved over tens of millions of years of evolution (Chen et al. 2014). The male germline expresses a number of TATA-associated factors that have been implicated in male-specific promoter usage for gene expression (M. Hiller et al. 2004; M. A. Hiller et al. 2001; Lu et al. 2020; V. C. Li et al. 2009). It is possible that OVO is a female germline specific TATA-associated factor, and if so, OVO binding sites at core promoters should share precise spacing with other core promoter elements, suggesting it is likely part of the complex. If not, then OVO is more likely to facilitate binding of the basal transcriptional machinery. Because of the extended footprint of engaged RNA polymerase, OVO and the basal machinery would not be likely to occupy the same region at the same time.”

      The description of the system used for the RNA-seq would benefit from additional clarity. It is not clear as written why it is "Lucky" that there is an mRNA isoform with extended exon 2 required for egg chamber development beyond stage 5. How does this requirement compare to the global requirement for OVO, which seems to be required for germ cell development even before stage 5? Understanding this system is essential for interpreting the RNA-seq results. Indeed, the authors have a separate manuscript (currently on bioRxiv) that explains the details of this system. As such, the current description requires that the reader refer to this additional pre-print. Could the authors include a diagram to better illustrate this system? Furthermore, since this RNA-seq is being performed on tissue that includes nurse cells, follicle cells, and germ cells from multiple stages of development, it is important for the authors to clearly state in which cell types OVO is expressed and likely functional. (While this is well beyond this manuscript, this analysis is the type that might benefit from the use of single-cell sequencing as a means to deconvolute the phenotypic effects of OVO loss.)

      We have rewritten the text to better describe the system for RNA-seq. We have also included a figure (Figure S1A) showing the alleles used that should help provide clarity for the readers. We agree that moving forward single cell experiments will be critical to have a better understanding of the transcriptional changes and chromatin dynamics with and without OVO. We have included the below changes to the text.

      Lines 409-423

      “Previous work from our lab has identified a transheterozygous ovo allelic combination (ovoovo-GAL4/ovoΔBP) that greatly reduces OVO activity resulting in sterility, however, female germ cells are able to survive up until at least stage 5 of oogenesis (Benner et al. 2023). ovoovo-GAL4 is a CRISPR/Cas9 derived T2A-GAL43xSTOP insertion upstream of the splice junction of exon 3 in the ovo-RA transcript (Figure S1A).

      Importantly, this insertion in the extended exon 3 would disrupt roughly 90% of the ovo-B transcripts. However, since about 10% of ovo-B transcripts utilize an upstream splice junction in exon 3, these transcripts would not be disrupted with the T2A-GAL4-3xSTOP insertion and thus allow for enough OVO activity for germ cell survival (Benner et al. 2023). Since ovoovo-GAL4 expresses GAL4 in place of full length OVO due to the T2A sequences, we can drive expression of a rescuing OVO-B construct downstream of UASp to generate OVO+ female germ cells, which in fact does rescue the arrested germ cell phenotype of ovoovo-GAL4/ovoΔBP ovaries. Therefore, in order to determine genes that are transcriptionally responsive to OVO, we compared the gene expression profiles in sets of ovaries that had the ovo hypomorphic phenotype with a negative control rescue construct (ovoovo-GAL4/ovoΔBP; UASp-GFP)(Figure 4A) versus those that drive expression of the rescue construct expressing OVO-B (ovoovo-GAL4/ovoΔBP; UASp-3xFHAOVO-B)(Figure 4B).”

      Lines 427-432

      “The adult female ovary contains somatic cells, germline stem cells, and germline derived nurse cells that would be profiled in a bulk ovary tissue RNA-seq experiment. Although OVO is only required and expressed in germline derived cell types, we chose to dissect one day old post-eclosion ovoovoGAL4/ovoΔBP; UASp-3xFHA-OVO-B female ovaries to enrich for early stages of oogenesis and collected only ovarioles containing the germarium through previtellogenic egg chambers.”

      On lines 526-532, it is unclear why the genes fs(1)N, fs(1)M3, and closca are particularly sensitive to the ovoD3 allele. What is this allele trans heterozygous with in the assay that allows development through egg laying? Why might these genes be unique in their sensitivity?

      These genes are not particularly sensitive, the transheterozygous hypomorphic ovo ovaries are weak enough to reveal the role of OVO for these genes. We rewrote this paragraph to try and provide more clarity to the relationship between OVO+ binding at these vitelline membrane genes and the phenotype of OVOD3 expressing females.

      Lines 562-577

      “We also found that the genes fs(1)N, fs(1)M3, and closca, were all bound by OVO and responded transcriptionally to the presence of ectopic rescue OVO. These genes are significant because they constitute a set of genes that are expressed in the germline and the encoded proteins are eventually incorporated into the vitelline membrane providing the structural integrity and impermeability of the egg (Mineo, Furriols, and Casanova 2017; Ventura et al. 2010). Loss-of-function of these three genes results in flaccid eggs that are permeable to dye and fail to develop. The loss-of-function phenotype of fs(1)N, fs(1)M3, and closca closely resembles the dominant antimorph ovoD3 phenotype. The ovoD3 allele is the weakest of the original dominant-negative ovo alleles and produces defective eggs allowing us to explore the role of OVO in late stages (Busson et al. 1983; Komitopoulou et al. 1983). ovoD3/ovo+ transheterozygous females express a repressive form of OVO that results in dominant sterility, and importantly, these females lay flaccid eggs with compromised vitelline membranes that are permeable to the dye neutral red (Oliver, Pauli, and Mahowald 1990). Since OVO+ is bound at the TSS of fs(1)N, fs(1)M3, and closca, and these three genes respond transcriptionally to OVO+, then it is plausible that the repressive OVOD3 is negatively regulating these three genes that are required for vitelline membrane formation. This is evidence that OVO is not only involved in regulating the expression of numerous essential maternal pathways for embryonic development, but it is also essential for regulating genes that are required for egg integrity and maturation.”

      The Discussion of OVO as a pioneer factor is highly speculative and based only on correlative data. In fact, the expression data in the embryonic germline is not included in this manuscript, but rather in a separate bioRxiv preprint. This makes it challenging to understand, why this is extensively discussed here. However, there are experiments that could begin to test this proposal. OVO could be expressed in an exogenous tissue and test whether it promotes accessibility. Also, mutations could be made (using gene editing) to identify previously known OVO binding sites in the otu and/or other promoters and these could be assayed for accessibility. By selecting promoters of genes that are not essential for germline development, the authors could directly test the role of OVO in promoting chromatin accessibility. Alternatively, are there reasons that the system used for RNA-seq couldn't be similarly used for ATACseq? It is imperfect but could provide insights into chromatin accessibility in the absence of OVO.

      We have largely removed the speculation on pioneering activity, reference to embryonic germline OVO dynamics included in the previous work, and Figure 7. These are excellent suggestions for experiments and ones we are currently pursuing. Below is the modified discussion. 

      Lines 645-663

      “The requirement for OVO at the TSS of target genes has been well characterized at its own locus as well as its downstream target otu. Our OVO ChIP and expression data confirm findings from previous work that OVO is binding to these target promoters, and in the case of otu, strongly responds transcriptionally to the presence of OVO. Although we did not test the requirement for OVO DNA binding motifs at other OVO bound genes in this work, this has been extensively explored before, showing that removal of OVO

      DNA binding sites overlapping the TSS results in a strong decrease in reporter expression (Lü et al. 1998; Bielinska et al. 2005; Lü and Oliver 2001). Removal of more distal upstream OVO DNA binding sites also reduces reporter expression to a lesser degree. However, for most cases tested, removal of OVO DNA binding sites while leaving the rest of the enhancer regions intact, never totally abolished reporter expression. These dynamics are highly similar to work that has been completed on the pioneer factor zelda (zld). Adding zld DNA binding motifs to a stochastically expressed transcriptional reporter increases the activity and response of the reporter (Dufourt et al. 2018). Distally located zld DNA binding motifs influenced reporter expression to a lesser degree than proximal sites. A single zld DNA binding site adjacent to the TSS produced the strongest reporter activity. Importantly, just like the activity of OVO transgenic reporters, there is not an absolute requirement for zld DNA binding to activate reporter expression, however, the addition of TSS adjacent zld DNA binding motifs does strongly influence reporter response. We know that zld achieves this reporter response through its pioneering activity (Xu et al. 2014; Harrison et al. 2011), whether OVO achieves this similar effect on gene expression through a shared mechanism, or in cooperation with other transcription factors needs to be further explored.”

      The authors suggest that OVO binding is essential for transcriptional activation, but that this may be indirect and that expression of other transcription factors might be necessary for activating gene expression. Did the motif analysis of the OVO-bound regions suggest additional transcription factors that might provide this function?

      We did find other motifs significantly enriched in OVO ChIP peaks. We performed XSTREME analysis on the same set of OVO ChIP peaks which allowed us to determine if any of these motifs were significant matches to DNA binding motifs of known transcription factors. Notably, the DNA binding motifs of GAF and CLAMP were enriched in OVO ChIP peaks. GAF is required in germline clones and the potentially for co-regulation of genes is possible. Other enriched motifs did not match any known binding motifs of other transcription factors but we reported some of the most significantly enriched motifs that were alongside of OVO in Figure S1C-F. The below text outlines changes made to the text incorporating these findings.

      Lines 170-182

      “Along with the OVO DNA binding motif, other motifs were also significantly enriched in OVO ChIP peaks. The motif 5’-GWGMGAGMGAGABRG-3’ (Figure S1C) was found in 18% of OVO ChIP peaks and is a significant match to the DNA binding motifs of the transcription factors GAF (Trl) (Omelina et al. 2011) and CLAMP (Soruco et al. 2013). Trl germline clones are not viable, indicating that GAF activity is required in the germline during oogenesis (Chen et al. 2009). The possibility that OVO binds with and regulates genes alongside of GAF given the enrichment of both transcription factors DNA binding motifs is intriguing. Other significantly enriched motifs 5’-ACACACACACACACA-3’ (29% of peaks, Figure S1D), 5’RCAACAACAACAACA-3’ (26% of peaks, Figure S1E), and 5’-GAAGAAGAAGAAGAR-3’ (17% of peaks,

      Figure S1F) were present in OVO ChIP peaks, however, these motifs did not significantly match known

      DNA binding motifs of other transcription factors. Determining the factors that bind to these sequences

      will certainly help elucidate our understanding of transcriptional control with relationship to OVO in the female germline.”

      The figures would benefit from a bit more detail in the legends (see comments below).

      Minor comments:

      In multiple places throughout the document, the citations are inadvertently italicized (see lines 57-59, 91, and 327 as examples.)

      We have changed this in these locations and other instances in the text.

      On line 76, when discussing OVO as a transcription factor this is referencing the protein and not the gene. Thus, should be written OVO and not ovo.

      We have made the correction ovo to OVO.

      On line 349, "core" promoters is likely what is meant rather than "care" promoters.

      We have corrected ‘care’ to ‘core’ in the text.

      On line 404, the authors state that they wanted to use a "less conservative log2 fold change" but it is not clear what they are comparing to. This is important to understand the motivation.

      We are talking about the gene expression comparison between the ectopic ovo rescue and ovo hypomorphic ovaries. “less conservative” was an unfortunate phrasing. We have rewritten the text to state this directly to the reader.

      Lines 435-444

      “We then performed RNA-seq in quadruplicate and measured the changes in gene expression between ectopic rescue OVO and hypomorphic OVO ovaries. We used a significance level of p-adj < 0.05 and a log2 fold change cutoff of >|0.5| to call differential expression between these two sets of ovaries. We utilized these log2 fold change cutoffs for two reasons. Our control ovary genotype (ovoovo-GAL4/ovoΔBP; UASp-GFP) has hypomorphic OVO activity, hence germ cells can survive but are arrested. With the addition of ectopic rescue OVO in ovoovo-GAL4/ovoΔBP; UASp-3xFHA-OVO-B ovaries, we predicted that genes that were directly regulated by OVO would transcriptionally respond, however, we were unsure as to what degree the response would be in comparison to hypomorphic OVO. We reasoned that if the changes were not significant between genotypes, then minor changes in gene expression would not matter.”

      On line 615, it is unclear what is meant by "showing expression with only 10s of bp of sequence in reporters."

      This is in reference to some of the previously studied ovo reporter deletion lines, however, we have decided to remove the below text in the revised discussion.

      “, despite being remarkably compact. The OVO-dependent ovo core promoter is very compact; showing expression with only 10s of bp of sequence in reporters.” 

      It would be useful to cite and discuss Dufourt et al. Nature Communications 2018 (PMID30518940) regarding the role of Zelda in potentiating transcriptional activation when mentioned on line 624.

      We have added this and the relationship to previous similar work on OVO in the discussion.

      Lines 645-663

      “The requirement for OVO at the TSS of target genes has been well characterized at its own locus as well as its downstream target otu. Our OVO ChIP and expression data confirm findings from previous work that OVO is binding to these target promoters, and in the case of otu, strongly responds transcriptionally to the presence of OVO. Although we did not test the requirement for OVO DNA binding motifs at other OVO bound genes in this work, this has been extensively explored before, showing that removal of OVO

      DNA binding sites overlapping the TSS results in a strong decrease in reporter expression (Lü et al. 1998; Bielinska et al. 2005; Lü and Oliver 2001). Removal of more distal upstream OVO DNA binding sites also reduces reporter expression to a lesser degree. However, for most cases tested, removal of OVO DNA binding sites while leaving the rest of the enhancer regions intact, never totally abolished reporter expression. These dynamics are highly similar to work that has been completed on the pioneer factor zelda (zld). Adding zld DNA binding motifs to a stochastically expressed transcriptional reporter increases the activity and response of the reporter (Dufourt et al. 2018). Distally located zld DNA binding motifs influenced reporter expression to a lesser degree than proximal sites. A single zld DNA binding site adjacent to the TSS produced the strongest reporter activity. Importantly, just like the activity of OVO transgenic reporters, there is not an absolute requirement for zld DNA binding to activate reporter expression, however, the addition of TSS adjacent zld DNA binding motifs does strongly influence reporter response. We know that zld achieves this reporter response through its pioneering activity (Xu et al. 2014; Harrison et al. 2011), whether OVO achieves this similar effect on gene expression through a shared mechanism, or in cooperation with other transcription factors needs to be further explored.”

      On line 1006 (Figure 1 legend), it is unclear what is meant by "The percentage of OVO ChIP peaks each motif was found". Is a word missing?

      This was unclear, we have revised the sentence below.

      Lines 1035-1036

      “The percentage of OVO ChIP peaks containing each motif and their corresponding p-value are indicated to the right.”

      In the Figure 1 legend, please include citations for the Garfinkel motif and Oliver motif.

      Included, as below.

      Lines 1036-1039

      “H) OVO ChIP minus input control ChIP-seq read coverage density centered on the location of the four de novo OVO DNA binding motifs and previously defined in vitro OVO DNA binding motifs (Lü et al. 1998, Bielinska et al. 2005, Lee and Garfinkel 2000).”

      In Figure 2 legend, it is unclear if B is all instances of a given motif or the DNA motifs that are bound by ChIP. Please clarify.

      We meant only the OVO DNA binding motifs that were within significant OVO ChIP peaks. We have revised the legend below.

      Lines 1049-1052

      “A, B) OVO ChIP minus input control, GSC and 32c ATAC-seq, GSC H3K27ac, H3K4me3, H3K27me3, H3K9me3, 8c NC H3K9me3, 32c NC H3K27ac, and H3K27me3 ChIP-seq read coverage density centered on each OVO peak maximum or OVO DNA binding motif located within a significant OVO ChIP peak.”

      The Figure legend for 2D could use more explanation. What do the lines and circles indicate?

      These lines and circles indicate the amount of overlapping peaks measured between the two datasets with solid circles. We have included a better description of what these indicate in the figure legend.

      Lines 1054-1058

      “D) Total number of significant peaks (left) and the total number of overlapping peaks (top) between OVO

      ChIP and GSC and 32c ATAC-seq, GSC H3K27ac, H3K4me3, H3K27me3, H3K9me3, 8c NC H3K9me3, 32c NC H3K27ac, and H3K27me3 ChIP-seq. Lines connecting solid dots indicates the amount of overlapping peaks between those two corresponding datasets.”

      In Figure 4C, bring the 564 blue dots forward so they are not masked by the yellow dots.

      We have brought the colored dots forward in both figure 4C and 4D.

      In Figure 4E, what is the order of the heatmaps?

      The order is genes with the highest to lowest OVO read density enrichment. We have included this in the figure 4 legend.

      Lines 1086-1087

      “The order of the heatmap is genes with the highest to lowest amount of OVO ChIP read density.”

      In Figure 5, the order of the tracks is not immediately obvious. It appears to be those chromatin features most associated with OVO ChIP and those less correlated. Additional clarity could be provided by showing these tracks (and in Supplemental Figure S2) in different colors with a reference to the figure legend about what the colors might indicate.

      We have changed the colors and order of the tracks to be more similar and consistent in both figures.

      Lines 1090-1093

      ovo gene level read coverage tracks for OVO ChIP minus input (black), GSC and 32c ATAC-seq (light blue), GSC and 32C H3K27ac (green), H3K4me3 (dark blue), GSC and 32c H3K27me3 (orange), and GSC and 8c H3K9me3 (pink) ChIP-seq, and ovoΔBP/ovoovo-GAL4; UASp-3xFHA-OVO-B minus ovoΔBP/ovoovo-GAL4; UASp-GFP RNA-seq (red).”

      In Figure S1 legend, what is the reference to the da-GAL4 X UAS transgene in the title?

      This was an error on our part and we have removed it.

      Reviewer #2 (Recommendations For The Authors):

      Overall, the manuscript would benefit from revisions of the writing style. At times it is difficult to distinguish between hypothesis and results. The use of colloquial phrases/prose was distracting while reading, which the authors may consider revising. Some sentences were confusing or extraneous, and the authors may consider revising those. Occasionally sentences within the results sections seem more appropriate for the materials and methods.

      (1) The manuscript is generally clear; however, it is at times difficult to distinguish between hypothesis and results. The use of colloquial phrases/prose was distracting while reading, which the authors may consider revising. Examples include:

      a)  Lines 48-49 "While thematic elements of this complex orchestration have been well studied, coordinate regulation of the symphony has not."

      We have edited this sentence below.

      Lines 48-50

      “While the complex interactions between maternally supplied mRNAs and proteins have been well studied, transcriptional regulation driving the expression of these pathways are less well understood.“

      b)  Lines 232-233 "In other words, where exactly does transcription start at these genes."

      We have removed this sentence.

      c)  Line 385, the word "sham" could be changed to "negative control" or "GFP control"

      We have rewritten this sentence below.

      Lines 419-423

      “Therefore, in order to determine genes that are transcriptionally responsive to OVO, we compared the gene expression profiles in sets of ovaries that had the ovo hypomorphic phenotype with a negative control rescue construct (ovoovo-GAL4/ovoΔBP; UASp-GFP)(Figure 4A) versus those that drive expression of the rescue construct expressing OVO-B (ovoovo-GAL4/ovoΔBP; UASp-3xFHA-OVO-B)(Figure 4B)”

      d)  Line 490 "For the big picture"

      We have removed this and revised with the below sentence.

      Lines 530-531

      “To do this, we performed Gene Ontology enrichment analysis with gProfiler software (Raudvere et al. 2019).

      (2) Some sentences were confusing or extraneous, and the authors may consider revising them. Examples include:

      a)  Lines 195-196 "Therefore, we plotted the significant ChIP (minus input) read density peaks centered on the location of the motif itself."

      We have removed the word ‘peaks’ and ‘itself’, as below.

      Lines 200-201

      “Therefore, we plotted the significant ChIP (minus input) read density centered on the location of the motif.”

      b)  Lines 201-203 "... over the location of the motifs, strongly reinforces the idea that our dataset contains regions centered on sequence-specifically bound OVO transcription factor in the ovary."

      We have edited this sentence to clarify below.

      Lines 204-208

      “While it is possible that OVO comes into contact with regions of DNA in three-dimensional nuclear space non-specifically, the presence of OVO motifs within a large percentage of significant ChIP peaks in vivo and enrichment of OVO ChIP read density at the location of the motifs, strongly reinforces the idea that our OVO ChIP dataset contains regions centered on sequences specifically bound by OVO in the ovary.”

      c)  Lines 326-328 "The combinations of these elements...tens of millions of years of evolution."

      We have revised this sentence below.

      Lines 354-357

      “The combinations of these DNA motifs is not random in mammals and Drosophila (FitzGerald et al. 2006), and distinct combinations of different motifs at the TSS of genes expressed in Drosophila are conserved over tens of millions of years of evolution (Chen et al. 2014).

      d)  Lines 444-446 "To address this directly, we tested the idea that genes with... and thus downstream of OVO."

      We have removed this sentence in its entirety.

      e)  Line 579-580 "Where OVO binding in close proximity, in any ...activates transcription"

      We have removed this sentence in its entirety.

      (3)    Occasionally sentences within the results sections seem more appropriate for the materials and methods. For example, lines 213-218.

      (4)    At the end of line 375, do the authors mean "only" instead of "also"?

      We have modified this sentence below.

      Lines 411-414

      ovoovo-GAL4 is a CRISPR/Cas9 derived T2A-GAL4-3xSTOP insertion upstream of the splice junction of exon 3 in the ovo-RA transcript (Figure S1A). Importantly, this insertion in the extended exon 3 would disrupt roughly 90% of the ovo-B transcripts. However, since about 10% of ovo-B transcripts utilize an upstream splice junction in exon 3, these transcripts would not be disrupted with the T2A-GAL4-3xSTOP insertion and thus allow for enough OVO activity for germ cell survival (Benner et al. 2023).”

      (5)    In line 392 the authors say that they dissected ovaries "one day post-eclosion" but the methods section says that ovaries were 3-5 days old. Please clarify.

      We meant one day old for the RNAseq experiments. We have changed this in the text.

      Lines 679-681

      “Twenty, one day old post-eclosion ovoΔBP/ovoovo-GAL4; UASp-GFP and ovoΔBP/ovoovo-GAL4; UASp-3xFHAOVO-B ovaries were dissected and germariums through previtellogenic egg chambers were removed with microdissection scissors and placed in ice cold PBS making up one biological replicate.”

      (6)    In line 668 the authors mention CRISPR/Cas9 in the methods, but no such experiment was described.

      We have removed this from the Methods header.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors engineer the endogenous left boundary of the Drosophila eve TAD, replacing the endogenous Nhomie boundary by either a neutral DNA, a wildtype Nhomie boundary, an inverted Nhomie boundary, or a second copy of the Homie boundary. They perform Micro-C on young embryos and conclude that endogenous Nhomie and Homie boundaries flanking eve pair with head-to-tail directionality to form a chromosomal stem loop. Abrogating the Nhomie boundary leads to ectopic activation of genes in the former neighboring TAD by eve embryonic stripe enhancers. Replacing Nhomie by an inverted version or by Homie (which pairs with itself head-to-head) transformed the stem loop into a circle loop. An important finding was that stem and circle loops differentially impact endogenous gene regulation both within the eve TAD and in the TADs bracketing eve. Intriguingly, an eve TAD with a circle loop configuration leads to ectopic activation of flanking genes by eve enhancers - indicating compromised regulatory boundary activity despite the presence of an eve TAD with intact left and right boundaries.

      Strengths:

      Overall, the results obtained are of high-quality and are meticulously discussed. This work advances our fundamental understanding of how 3D genome topologies affect enhancer-promoter communication.

      Weaknesses:

      Though convincingly demonstrated at eve, the generalizability of TAD formation by directional boundary pairing remains unclear, though the authors propose this mechanism could underly the formation of all TADs in Drosophila and possibly even in mammals. Strong and ample evidence has been obtained to date that cohesin-mediated chromosomal loop extrusion explains the formation of a large fraction of TADs in mammals. 

      (1.1) The difficultly with most all of the studies on mammal TADs, cohesin and CTCF roadblocks is that the sequencing depth is not sufficient, and large bin sizes (>1 kb) are needed to visualize chromosome architecture.  The resulting contact profiles show TAD neighborhoods, not actual TADs.

      The problem with these studies is illustrated by comparing the contact profiles of mammalian MicroC data sets at different bin sizes in Author response image 1.  In this figure, the darkness of the “pixels” in panels E, F, G and H was enhanced by reducing brightness in photoshop.

      Author response image 1.

      Mammalian MicroC profiles different bun sizes

      Panels A and C show “TADs” using bin sizes typical of most mammalian studies (see Krietenstein et al. (2023) (Krietenstein et al. 2020)).  At this level of resolution, TADs, the “trees” that are the building blocks of chromosomes, are not visible.  Instead, what is seen are TAD neighborhoods or “forests”.  Each neighborhood consists of several dozen individual TADs.  The large bins in these panels also artificially accentuated TAD:TAD interactions, generating a series of “stripes” and “dots” that correspond to TADs bumping into each other and sequences getting crosslinked.  For example, in panel A there is prominent stripe on the edge of a “TAD” (blue arrow).  In panel C, this stripe resolves into a series of dots arranged as parallel, but interrupted “stripes” (green and blue arrows).  At the next level of resolution, it can be seen that the stripe marked by the blue arrow and magenta asterisk is generated by contacts between the left boundary of the TAD indicated by the magenta bar with sequences in a TAD (blue bar) ~180 kb way.  While dots and stripes are prominent features in contact profiles visualized with larger bin sizes (A and C), the actual TADs that are observed with a bin size of 200 bp (examples are underlined by black bars in panel G) are not bordered by stripes, nor are they topped by obvious dots.  The one possible exception is the dot that appears at the top of the volcano triangle underlined with magenta.

      The chromosome 1 DNA segment from the MicroC data of Hseih et al. (2023) (Hsieh et al. 2020) shows a putative volcano triangle with a plume (indicated by a V in Author response image 1 panels D, F and H).  Sequences in the V TAD don’t crosslink with their immediate neighbors, and this gives a “plume” above the volcano triangle, as indicate by the light blue asterisk in panels D, F and H.  Interestingly the V TAD does contact two distant TADs, U on the left and W on the right. The U TAD is ~550 kb from V, and the region of contact is indicated by the black arrow.  The W TAD is ~585 kb from V, and the region of contact is indicated by the magenta arrow.  While the plume still seems to be visible with a bin size of 400 bp (light blue asterisk), it is hard to discern when the bin size is 200 bp, as there are not enough reads.

      The evidence demonstrating that cohesin is required for TAD formation/maintenance is based on low resolution Hi-C data, and the effects that are observed are on TAD neighborhoods (forests) and not TADs (trees).  In fact, there is published evidence that cohesin is not required in mammals for TAD formation/maintenance.  In an experiment from Goel et al. 2023 the authors depleted the cohesin component Rad21 and then visualized the effects on TAD organization using the high resolution region capture MicroC (RCMC) protocol.  The MicroC contact map in this figure visualizes a ~250 kb DNA segment around the Ppm1pg locus at 250 bp resolution.  On the right side of the diagonal is the untreated control, while the left side shows the MicroC profile of the same region after Rad21 depletion.  The authors indicated that there was a 97% depletion of Rad21 in their experiment.  However, as is evident from a comparison of the experimental and control, loss of Rad21 has no apparent effect on the TAD organization of this mammalian DNA segment.

      Several other features are worth noting.  First, unlike the MicroC experiments shown in Author response image 1, there are dots at the apex of the TADs in this chromosomal segment.  In the MicroC protocol, fixed chromatin is digested to mononucleosomes by extensive MNase digestion.  The resulting DNA fragments are then ligated, and dinucleosome-length fragments are isolated and sequenced. 

      DNA sequences that are nucleosome free in chromatin (which would be promoters, enhancers, silencers and boundary elements) are typically digested to oligonucleotides in this procedure and won’t be recovered. This means that the dots shown here must correspond to mononucleosome-length elements that are MNase resistant.  This is also true for the dots in the MicroC contact profiles of the Drosophila Abd-B regulatory domain (see Fig. 2B in the paper).  Second, the TADs are connected to each other by 45o stripes (see blue and green arrowheads).  While it is not clear from this experiment whether the stipes are generated by an active mechanism (enzyme) or by some “passive” mechanism (e.g., sliding), the stripes in this chromosomal segment are not generated by cohesin, as they are unperturbed by Rad21 depletion.  Third, there are no volcano triangles with plumes in this chromosomal DNA segment.  Instead, the contact patterns (purple and green asterisks) between neighboring TADs closely resemble those seen for the Abd-B regulatory domains (compare Goel et al. 2023 with Fig. 2B in the paper).  This similarity suggests that the TADs in and around Ppm1g may be circle-loops, not stem-loops.  As volcano triangles with plumes also seem to be rare in the MicroC data sets of Krietenstein et al. (Krietenstein et al. 2020) and Hesih et al. (Hsieh et al. 2020) (with the caveat that these data sets are low resolution: see Author response image 1), it is possible that much of the mammalian genome is assembled into circle-loop TADs, a topology that can’t be generated by the cohesin loop extrusion (bolo tie clip) /CTCF roadblock model.

      While Rad21 depletion has no apparent effect on TADs, it does appear to impact TAD neighborhoods.  This is in a supplemental figure in Goel et al. (Goel et al. 2023).  In this figure, TADs in the Ppm1g region of chromosome 5 are visualized with bin sizes of 5 kb and 1 kb.  A 1.2 Mb DNA segment is shown for the 5 kb bin size, while an 800 kb DNA segment is shown for the 1 kb bin size.  As can be seen from comparing the MicroC profiles in Author response image 2 with that in Goel et al. 2023, individual TADs are not visible.  Instead, the individual TADs are binned into large TAD “neighborhoods” that consist of several dozen or more TADs.

      Unlike the individual TADs shown in Goel et al. 2023, the TAD neighborhoods in Author response image 2 are sensitive to Rad21 depletion.  The effects of Rad21 depletion can be seen by comparing the relative pixel density inside the blue lines before (above the diagonal) and after (below the diagonal) auxin-induced Rad21 degradation.  The reduction in pixel density is greatest for more distant TAD:TAD contacts (farthest from the diagonal).  By contrast, the TADs themselves are unaffected (Goel et al. 2023), as are contacts between individual TADs and their immediate neighbors.  In addition, contacts between partially overlapping TAD neighborhoods are also lost.  At this point it isn’t clear why contacts between distant TADs in the same neighborhood are lost when Rad21 is depleted; however, a plausible speculation is that it is related to the functioning of cohesin in holding newly replicated DNAs together until mitosis and whatever other role it might have in chromosome condensation.

      Author response image 2.

      Ppm1g full locus chr5

      Moreover, given the unique specificity with which Nhomie and Homie are known to pair (and exhibit "homing" activity), it is conceivable that formation of the eve TAD by boundary pairing represents a phenomenon observed at exceptional loci rather than a universal rule of TAD formation. Indeed, characteristic Micro-C features of the eve TAD are only observed at a restricted number of loci in the fly genome…..

      (1.2) The available evidence does not support the claim that nhomie and homie are “exceptional.”  To begin with, nhomie and homie rely on precisely the same set of factors that have been implicated in the functioning of other boundaries in the fly genome.  For example, homie requires (among other factors) the generic boundary protein Su(Hw) for insulation and long-distance interactions (Fujioka et al. 2024).  (This is also true of nhomie: unpublished data.)  The Su(Hw) protein (like other fly polydactyl zinc finger proteins) can engage in distant interactions.  This was first shown by Sigrist and Pirrotta (Sigrist and Pirrotta 1997), who found that the su(Hw) element from the gypsy transposon can mediate long-distance regulatory interactions (PRE dependent silencing) between transgenes inserted at different sites on homologous chromosomes (trans interactions) and at sites on different chromosomes.

      The ability to mediate long-distance interactions is not unique to the su(Hw) element, or homie and nhomie.  Muller et al. (Muller et al. 1999) found that the Mcp boundary from the Drosophila BX-C is also able to engage in long-distance regulatory interactions—both PRE-dependent silencing of mini-white and enhancer activation of mini-white and yellow.  The functioning of the Mcp boundary depends upon two other generic insulator proteins, Pita and the fly CTCF homolog (Kyrchanova et al. 2017).  Like Su(Hw) both are polydactyl zinc finger proteins, and they resemble the mammalian CTCF protein in that their N-terminal domain mediates multimerization (Bonchuk et al. 2020; Zolotarev et al. 2016).  Figure 6 from Muller et el. 1999 shows PRE-dependent “pairing sensitive silencing” interactions between transgenes carrying a mini-white reporter, the Mcp and scs’ (Beaf dependent)(Hart et al. 1997) boundary elements, and a PRE closely linked to Mcp.  In this experiment flies homozygous for different transgene inserts were mated and the eye color was examined in their transheterozygous progeny.  As indicated in the figure, the strongest trans-silencing interactions were observed for inserts on the same chromosomal arm; however, transgenes inserted on the left arm of chromosome 3 can interact across the centromere with transgenes inserted on the right arm of chromosome 3. 

      Figure 5C (left) from Muller et el. 1999 shows a trans-silencing interaction between w#11.102 at 84D and w#11.16 approximately 5.8 Mb away, at 87D.  Figure 5C (right) shows a trans-silencing interaction across the centromere between w#14.29 on the left arm of chromosome 3 at 78F and w#11.102 on the right arm of chromosome 3 at 84D. The eye color phenotype of mini-white-containing transgenes is usually additive: homozygyous inserts have twice as dark eye color as the corresponding hemizygous inserts.  Likewise, in flies trans-_heterozygous for _mini-white transgenes inserted at different sites, the eye color is equivalent to the sum of the two transgenes.  This is not true when mini-white transgenes are silenced by PREs.  In the combination shown in panel A, the t_rans-_heterozygous fly has a lighter eye color than either of the parents.  In the combination in panel B, the _trans-_heterozygous fly is slightly lighter than either parent.

      As evident from the diagram in Figure 6 from Muller et el. 1999, all of the transgenes inserted on the 3rd chromosome that were tested were able to participate in long distance (>Mbs) regulatory interactions.  On the other hand, not all possible pairwise interactions are observed.  This would suggest that potential interactions depend upon the large scale (Mb) 3D folding of the 3rd chromosome.

      When the scs boundary (Zw5 dependent) (Gaszner et al. 1999) was added to the transgene to give sMws’, it further enhanced the ability of distant transgenes to find each other and pair.  All eight of the sMws’ inserts that were tested were able to interact with at least one other sMws’ insert on a different chromosome and silence mini-white.  Vazquez et al. () subsequently tagged the sMws’ transgene with LacO sequences (ps0Mws’) and visualized pairing interactions in imaginal discs.  Trans-heterozygous combinations on the same chromosome were found paired in 94-99% of the disc nuclei, while a trans-heterozygous combination on different chromosomes was found paired in 96% of the nuclei (Table 3 from Vazquez et al. 2006).  Vazquez et al. also examined a combination of four transgenes inserted on the same chromosome (two at the same insertion site, and two at different insertion sites).  In this case, all four transgenes were clustered together in 94% of the nuclei (Table 3 from Vazquez et al. 2006).  Their studies also suggest that the distant transgenes remain paired for at least several hours.  A similar experiment was done by Li et al. (Li et al. 2011), except that the transgene contained only a single boundary, Mcp or Fab-7.  While pairing was still observed in trans-heterozygotes, the frequency was reduced without scs and scs’.

      It is worth pointing out that there is no plausible mechanism in which cohesin could extrude a loop through hundreds of intervening TADs, across the centromere (ff#13.101_ßà_w#11.102: Figure 6 from Muller et el. 1999; w#14.29_ßà_w#11.02: Figure 6 from Muller et el. 1999 and 5) and come to a halt when it “encounters” Mcp containing transgenes on different homologs.  The same is true for Mcp-dependent pairing interactions in cis (Fig. 7 in Muller et al. (Muller et al. 1999)) or Mcp-dependent pairing interactions between transgenes inserted on different chromosomes (Fig. 8 in Muller et al. (Muller et al. 1999); Line 8 in Table 3 from Vazquez et al. 2006). 

      These are not the only boundaries that can engage in long-distance pairing.  Mohana et al. (Mohana et al. 2023) identified nearly 60 meta-loops, many of which appear to be formed by the pairing of TAD boundary elements.  Two examples (at 200 bp resolution from 12-16 hr embryos) are shown in Author response image 3.

      Author response image 3.

      Metaloops on the 2nd and 3rd chromosomes: circle-loops and multiple stem-loops

      One of these meta-loops (panel A) is generated by the pairing of two TAD boundaries on the 2nd chromosome.  The first boundary, blue, (indicated by blue arrow) is located at ~2,006, 500 bp between a small TAD containing the Nplp4 and CG15353 genes and a larger TAD containing 3 genes, CG33543, Obp22a and Npc2aNplp4 encodes a neuropeptide.  The functions of CG15354 and CG33543 are unknown.  Obp22a encodes an odorant binding protein, while Npc2a encodes the Niemann-Pick type C-2a protein which is involved sterol homeostasis.  The other boundary (purple: indicated by purple arrow) is located between two TADs 2.8 Mb away at 4,794,250 bp.  The upstream TAD contains the fipi gene (CG15630) which has neuronal functions in male courtship, while the downstream TAD contains CG3294, which is thought to be a spliceosome component, and schlaff (slf) which encodes a chitin binding protein.  As illustrated in the accompanying diagram, the blue boundary pairs with the purple boundary in a head-to-head orientation, generating a ~2.8 Mb loop with a circle-loop topology.  As a result of this pairing, the multi-gene (CG33543, Obp22a and Npc2a) TAD upstream of the blue boundary interacts with the CG15630 TAD upstream of the purple boundary.  Conversely the small Nplp4:CG15353 TAD downstream of the blue boundary interacts with the CG3294:slf TAD downstream of the purple boundary.  Even if one imagined that the cohesin bolo tie clip was somehow able to extrude 2.8 Mb of chromatin and then know to stop when it encountered the blue and purple boundaries, it would’ve generated a stemloop, not a circle-loop.

      The second meta-loop (panel B) is more complicated as it is generated by pairing interactions between four boundary elements.  The blue boundary (blue arrow) located ~4,801,800 bp (3L) separates a large TAD containing the RhoGEF64C gene from a small TAD containing CG7509, which encodes a predicted subunit of an extracellular carboxypeptidase.  As can be seen in the MicroC contact profile and the accompanying diagram, the blue boundary pairs with the purple boundary (purple arrow) which is located at ~7,013, 500 (3L) just upstream of the 2nd internal promoter (indicated by black arrowhead) of the Mp (Multiplexin) gene.  This pairing interaction is head-to-tail and generates a large stem-loop that spans ~2.2 Mb.  The stem-loop brings sequences upstream of the blue boundary and downstream of the purple boundary into contact (the strings below a bolo tie clip), just as was observed in the boundary bypass experiments of Muravyova et al. (Muravyova et al. 2001) and Kyrchanova et al. (Kyrchanova et al. 2008).  The physical interactions result in a box of contacts (right top) between sequences in the large RhoGEF64C TAD and sequences in a large TAD that contains an internal Mp promoter.  The second pairing interaction is between the brown boundary (brown arrow) and the green boundary (green arrow).  The brown boundary is located at ~4 805,600 bp (3L) and separates the TAD containing CG7590 from a large TAD containing CG1808 (predicted to encode an oxidoreductase) and the Dhc64C (Dynein heavy chain 64C) gene.  The green boundary is located at ~6,995,500 bp (3L), and it separates a TAD containing CG32388 and the biniou (bin) transcription factor from a TAD that contains the most distal promoter of the Mp (Multiplexin) gene (blue arrowhead).  As indicated in the diagram, the brown and green boundaries pair with each other head-to-tail, and this generates a small internal loop (and the final configuration would resemble a bolo tie with two tie clips).  This small internal loop brings the CG7590 TAD into contact with the TAD that extends from the distal Mp promoter to the 2nd internal Mp promoter.  The resulting contact profile is a rectangular box with diagonal endpoints corresponding to the paired blue:purple and brown:green boundaries.  The pairing of the brown:green boundaries also brings the TADs immediately downstream of the brown boundary and upstream of the green boundary into contact with each other, and this gives a rectangular box of interactions between the Dhc64C TAD, and sequences in the bin/CG3238 TAD.  This box is located on the lower left side of the contact map.

      Since the bin and Mp meta-loops in Author response image 3B are stem-loops, they could have been generated by “sequential” cohesin loop extrusion events.  Besides the fact that cohesin extrusion of 2 Mb of chromatin and breaking through multiple intervening TAD boundaries challenges the imagination, there is no mechanism in the cohesion loop extrusion/CTCF roadblock model to explain why cohesion complex 1 would come to a halt at the purple boundary on one side and the blue boundary on the other, while cohesin complex 2 would instead stop when it hits the brown and green boundaries.  This highlights another problem with the cohesin loop extrusion/CTCF roadblock model, namely that the roadblocks are functionally autonomous: they have an intrinsic ability to block cohesin that is entirely independent of the intrinsic ability of other roadblocks in the neighborhood.  As a result, there is no mechanism for generating specificity in loop formation.  By contrast, boundary pairing interactions are by definition non-autonomous and depend on the ability of individual boundaries to pair with other boundaries: specificity is built into the model. The mechanism for pairing, and accordingly the basis for partner preferences/specificity, are reasonably well understood.  Probably the most common mechanism in flies is based on shared binding sites for architectural proteins that can form dimers or multimers (Bonchuk et al. 2021; Fedotova et al. 2017).  Flies have a large family of polydactyl zinc finger DNA binding proteins, and as noted above, many of these form dimers or multimers and also function as TAD boundary proteins.  This pairing principle was first discovered by Kyrchanova et al. (Kyrchanova et al. 2008).  This paper also showed that orientation-dependent pairing interactions is a common feature of endogenous fly boundaries.  Another mechanism for pairing is specific protein:protein interactions between different DNA binding factors (Blanton et al. 2003).  Yet a third mechanism would be proteins that bridge different DNA binding proteins together.  The boundaries that use these different mechanisms (BX-C boundaries, scs, scs’) depend upon the same sorts of proteins that are used by homie and nhomie.  Likewise, these same set of factors reappear in one combination or another in most other TAD boundaries.  As for the orientation of pairing interactions, this is most likely determined by the order of binding sites for chromosome architectural proteins in the partner boundaries.

      …and many TADs lack focal 3D interactions between their boundaries.

      (1.3) The idea that flies differ from mammals in that they “lack” focal 3D interactions is simply mistaken.  One of the problems with drawing this distinction is that most all of the “focal 3D interactions” seen mammalian Hi-C experiments are a consequence of binning large DNA segments in low resolution restriction enzyme-dependent experiments.  This is even true in the two “high” resolution MicroC experiments that have been published (Hsieh et al. 2020; Krietenstein et al. 2020).  As illustrated above in Author response image 1, most of the “focal 3D interactions” (the dots at the apex of TAD triangles) seen with large bin sizes (1 kb and greater) disappear when the bin size is 200 bp and TADs rather than TAD neighborhoods are being visualized.

      As described in point #1.1, in the MicroC protocol, fixed chromatin is first digested to mononucloesomes by extensive MNase digestion, processed/biotinylated, and ligated to give dinucleosome-length fragments, which are then sequenced.  Regions of chromatin that are nucleosome free (promoters, enhancers, silencers, boundary elements) will typically be reduced to oligonucleotides in this procedure and will not be recovered when dinucleosome-length fragments are sequenced.  The loss of sequences from typical paired boundary elements is illustrated by the lar meta-loop shown in Author response image 4 (at 200 bp resolution).  Panels A and B show the contact profiles generated when the blue boundary (which separates two TADs that span  the Lar (Leukocyteantigen-related-like) transcription unit interacts with the purple boundary (which separates two TADs in a gene poor region ~620 kb away).  The blue and purple boundaries pair with each other head-to-head, and this pairing orientation generates yet another circle-loop.  In the circle-loop topology, sequences in the TADs upstream of both boundaries come into contact with each other, and this gives the small dark rectangular box to the upper left of the paired boundaries (Author response image 4A).  (Note that this small box corresponds to the two small TADs upstream of the blue and purple boundaries, respectively. See panel B.)  Sequences in the TADs downstream of the two boundaries also come into contact with each other, and this gives the large box to the lower right of the paired boundaries.  While this meta-loop is clearly generated by pairing interactions between the blue and purple boundaries, the interacting sequences are degraded in the MicroC protocol, and sequences corresponding to the blue and purple boundaries aren’t recovered.  This can be seen in panel B (red arrow and red arrowheads).  When a different Hi-C procedure is used (dHS-C) that captures nucleosome-free regions of chromatin that are physically linked to each other (Author response image 4C & D), the sequences in the interacting blue and purple boundaries are recovered and generate a prominent “dot” at their physical intersection (blue arrow in panel D).

      Author response image 4.

      Lar metaloop. Panels A & bB: MicroC. Panels C & D: dHS-C

      While sequences corresponding to the blue and purple boundaries are lost in the MicroC procedure, there is at least one class of elements that engage in physical pairing interactions whose sequences are (comparatively) resistant to MNase digestion.  This class of elements includes many PREs ((Kyrchanova et al. 2018); unpublished data), the boundary bypass elements in the Abd-B region of BX-C (Kyrchanova et al. 2023; Kyrchanova et al. 2019a; Kyrchanova et al. 2019b; Postika et al. 2018), and “tethering” elements (Batut et al. 2022; Li et al. 2023).  In all of the cases tested, these elements are bound in nuclear extracts by a large (>1000 kD) GAGA factor-containing multiprotein complex called LBC.  LBC also binds to the hsp70 and eve promoters (unpublished data).  Indirect end-labeling experiments (Galloni et al. 1993; Samal et al. 1981; Udvardy and Schedl 1984) indicate that the LBC protects a ~120-180 bp DNA segment from MNase digestion.  It is likely that this is the reason why LBC-bound sequences can be recovered in MicroC experiments as dots when they are physically linked to each other.  One such example (based on the ChIP signatures of the paired elements) is indicated by the green arrow in panel B and D of Author response image 4.  Note that there are no dots corresponding to these two LBC elements within either of the TADs immediately downstream of the blue and purple boundaries.  Instead the sequences corresponding to the two LBC elements are only recovered when the two elements pair with each other over a distance of ~620 kb.  The fact that these two elements pair with each other is consistent with other findings which indicate that, like classical boundaries, LBC elements exhibit partner preferences.  In fact, LBC elements can sometimes function as TAD boundaries.  For example, the Fab-7 boundary has two LBC elements, and full Fab-7 boundary function can be reconstituted with just these two elements (Kyrchanova et al. 2018).

      Reviewer #2 (Public Review):

      "Chromatin Structure II: Stem-loops and circle-loops" by Ke*, Fujioka*, Schedl, and Jaynes reports a set of experiments and subsequent analyses focusing on the role of Drosophila boundary elements in shaping 3D genome structure and regulating gene expression. The authors primarily focus on the region of the fly genome containing the even skipped (eve) gene; eve is expressed in a canonical spatial pattern in fly embryos and its locus is flanked by the well-characterized neighbor of homie (nhomie) and homie boundary elements. The main focus of investigation is the orientation dependence of these boundary elements, which had been observed previously using reporter assays. In this study, the authors use Crispr/Cas9 editing followed by recombination-mediated cassette exchange to create a series of recombinant fly lines in which the nhomie boundary element is either replaced with exongenous sequence from phage 𝝀, an inversion of nhomie, or a copy of homie that has the same orientation as the endogenous homie sequence. The nhomie sequence is also regenerated in its native orientation to control for effects introduced by the transgenesis process.

      The authors then perform high-resolution Micro-C to analyze 3D structure and couple this with fluorescent and colorimetric RNA in situ hybridization experiments to measure the expression of eve and nearby genes during different stages of fly development. The major findings of these experiments are that total loss of boundary sequence (replacement with 𝝀 DNA) results in major 3D structure changes and the most prominent observed gene changes, while inversion of the nhomie boundary or replacement with homie resulted in more modest effects in terms of 3D structure and gene expression changes and a distinct pattern of gene expression change from the 𝝀 DNA replacement. As the samples in which the nhomie boundary is inverted or replaced with homie have similar Micro-C profiles at the eve locus and show similar patterns of a spurious gene activation relative to the control, the observed effects appear to be driven by the relative orientation of the nhomie and homie boundary elements to one another.

      Collectively, the findings reported in the manuscript are of broad interest to the 3D genome field. Although extensive work has gone into characterizing the patterns of 3D genome organization in a whole host of species, the underlying mechanisms that structure genomes and their functional consequences are still poorly understood. The perhaps best understood system, mechanistically, is the coordinated action of CTCF with the cohesin complex, which in vertebrates appears to shape 3D contact maps through a loop extrusion-pausing mechanism that relies on orientation-dependent sequence elements found at the boundaries of interacting chromatin loops.

      (2.1) The notion that mammalian genome is shaped in 3D by the coordinate action of cohesin and CTCF has achieved the status of dogma in the field of chromosome structure in vertebrates.  However, as we have pointed out in #1.1, the evidence supporting this dogma is far from convincing.  To begin with, it is based on low resolution Hi-C experiments that rely on large bin sizes to visualize so-called “TADs.”  In fact, the notion that cohesin/CTCF are responsible on their own for shaping the mammalian 3D genome appears to be a result of mistaking a series of forests for the actual trees that populate each of the forests.

      As illustrated in Author response image 1 above, the “TADs” that are visualized in these low resolution data sets are not TADs at all, but rather TAD neighborhoods consisting of several dozen or more individual TADs.  Moreover, the “interesting” features that are evident at low resolution (>1 kb)—the dots and stripes—largely disappear at resolutions appropriate for visualizing individual TADs (~200 bp).

      In Goel et al. 2023, we presented data from one of the key experiments in Goel et al. (Goel et al. 2023).  In this experiment,  the authors used RCMC to generate high resolution (~250 bp) MicroC contact maps before and after Rad21 depletion.  Contrary to dogma, Rad21 depletion has absolutely no effect on TADs in a ~250 kb DNA segment—and these TADs look very much like the TADs we observe in the Drosophila genome, in particular in the Abd-B region of BX-C that is thought to be assembled into a series of circle-loops (see Fig. 2B).

      While Goel et al. (Goel et al. 2023) observed no effect of Rad21 depletion on TADs, they found that loss of Rad21 disturbs long-distance (but not short-distance) contacts in large TAD neighborhoods when their RCMC data set is visualized using bin sizes of 5 kb and I kb.  This is shown in Author response image 2.  The significance of this finding is, however, uncertain.  It could mean that the 3D organization of large TAD neighborhoods have a special requirement for cohesin activity.  On the other hand, since cohesin functions to hold sister chromosomes together after replication until they separate during mitosis (and might also participate in mitotic condensation), it is also possible that the loss of long-range contacts in large TAD neighborhoods when Rad21 is depleted is simply a reflection of this particular activity.  Further studies will be required to address these possibilities.

      As for CTCF: a careful inspection of the ChIP data in Goel et al. 2023 indicates that CTCF is not found at each and every TAD boundary.  In fact, the notion that CTCF is the be-all and end-all of TAD boundaries in mammals is truly hard to fathom.  For one, the demands for specificity in TAD formation (and in regulatory interactions) are likely much greater than those in flies, and specificity can’t be generated by a single DNA binding protein.  For another, several dozen chromosomal architectural proteins have already been identified in flies.  This means that (unlike what is thought to be true in mammals) it is possible to use a combinatorial mechanism to generate specificity in, for example, the long distance interactions in RFig 6 and 7.  As noted in #2.1 above, many of the known chromosomal architectural proteins in flies are polydactyl zinc finger proteins (just like CTCF).  There are some 200 different polydactyl zinc finger proteins in flies, and the function of only a hand full of these is known at present.  However, it seems likely that a reasonable fraction of this class of DNA binding proteins will ultimately turn out to have an architectural function of some type (Bonchuk et al. 2021; Fedotova et al. 2017).  The number of different polydactyl zinc finger protein genes in mammals is nearly 3 times that of flies.  It is really possible that of these, only CTCF is involved in shaping the 3D structure of the mammalian genome?

      Despite having a CTCF paralog and cohesin, the Drosophila genome does not appear to be structure by loop extrusion-pausing. The identification of orientation-dependent elements with pronounced structural effects on genome folding thus may shed light on alternative mechanisms used to regulated genome structure, which in turn may yield insights into the significance of particular folding patterns.

      (2.2) Here we would like to draw the reviewer’s and reader’s attention to Author response image 3, which shows that orientation-dependent pairing interactions have a significant impact on physical interactions between different sequences.  We would also refer the reader to two other publications.  One of these is Kyrchanova et al. (Kyrchanova et al. 2008), which was the first to demonstrate that orientation of pairing interactions matters.  The second is Fujioka et al. (Fujioka et al. 2016), which describes experiments indicating that nhomie and homie pair with each other head-to-tail and with themselves head-to-head.

      On the whole, this study is comprehensive and represents a useful contribution to the 3D genome field. The transgenic lines and Micro-C datasets generated in the course of the work will be valuable resources for the research community. Moreover, the manuscript, while dense in places, is generally clearly written and comprehensive in its description of the work. However, I have a number of comments and critiques of the manuscript, mainly centering on the framing of the experiments and presentation of the Micro-C results and on manner in which the data are analyzed and reported. They are as follows:

      Major Points:

      (1) The authors motivate much of the introduction and results with hypothetical "stem loop" and "circle loop" models of chromosome confirmation, which they argue are reflected in the Micro-C data and help to explain the observed ISH patterns. While such structures may possibly form, the support for these specific models vs. the many alternatives is not in any way justified. For instance, no consideration is given to important biophysical properties such as persistence length, packing/scaling, and conformational entropy. As the biophysical properties of chromatin are a very trafficked topic both in terms of experimentation and computational modeling and generally considered in the analysis of chromosome conformation data, the study would be strengthened by acknowledgement of this body of work and more direct integration of its findings.

      (2.3) The reviewer is not correct in claiming that “stem-loops” and “circle-loops” are “hypothetical.”  There is ample evidence that both types of loops are present in eukaryotic genomes, and that loop conformation has significant readouts in terms of not only the physical properties of TADs but also their functional properties.  Here we would draw the reviewer’s attention to Author response image 3 and Author response image 4 for examples of loops formed by the orientation-dependent pairing of yet other TAD boundary elements.  As evident from the MicroC data in these figures, circle-loops and stem-loops have readily distinguishable contact patterns.  The experiments in Fujioka et al. (Fujioka et al. 2016) demonstrate that homie and nhomie pair with each other head-to-tail, while they pair with themselves head-to-head.  The accompany paper (Bing et al. 2024) also provides evidence that loop topology is reflected both in the pattern of activation of reporters and in the MicroC contact profiles.  We would also mention again Kyrchanova et al. (Kyrchanova et al. 2008), who were the first to report orientation-dependent pairing of endogenous fly boundaries.

      At this juncture it would premature to try to incorporate computational modeling of chromosome conformation in our studies.  The reason is that the experimental foundations that would be essential for building accurate models are lacking.  As should be evident from RFigs. 1-3 above, studies on mammalian chromosomes are simply not of high enough resolution to draw firm conclusions about chromosome conformation: in most studies only the forests are visible.  While the situation is better in flies, there are still too many unknown.  As just one example, it would be important to know the orientation of the boundary pairing interactions that generate each TAD.  While it is possible to infer loop topology from how TADs interact with their neighbors (a plume versus clouds), a conclusive identification of stem- and circle-loops will require a method to unambiguously determine whether a TAD boundary pairs with its neighbor head-to-head or headto-tail.

      (2) Similar to Point 1, while there is a fair amount of discussion of how the observed results are or are not consistent with loop extrusion, there is no discussion of the biophysical forces that are thought to underly compartmentalization such as block-polymer co-segregation and their potential influence. I found this absence surprising, as it is generally accepted that A/B compartmentalization essentially can explain the contact maps observed in Drosophila and other non-vertebrate eukaryotes (Rowley, ..., Corces 2017; PMID 28826674). The manuscript would be strengthened by consideration of this phenomenon.

      (2.4) Compartments in mammals have typically been identified and characterized using lowresolution data sets, and these studies have relied on visualizing compartments using quite large bin sizes (>>1 kb).  Our experiments have nothing to do with the large-scale compartments seen in these Hi-C experiments.  Instead, we are studying the properties of individual TADs: how TADs are formed, the relationship between TAD topology and boundary:boundary pairing, and the impact of TAD topology on interactions between TADs in the immediate neighborhood.  There is no evidence to date that these large compartments or “block polymer co-segregation” have a) any impact on the properties of individual boundary elements, b) have a role in determining which boundary elements actually come together to form a given TAD, c) impact the orientation of the interactions between boundaries that generate the TAD or d) determine how TADs tend to interact with their immediate neighbors.  

      In more recent publications (c.f., Harris et al. 2023) compartments have shrunk in size and instead of being units of several hundred kb, the median length of the “compartmental” unit in mammalian cells is about12 kb. This is not too much different from the size of fly TADs.  However, the available evidence does not support the idea that block polymer co-segregation/co-repulsion drive the TAD:TAD interactions seen in MicroC experiments.  For example, according to this “micro-compartment” model, the specific patterns of interaction between TADs in the CG3294 meta-loop in Author response image 3 would be driven by block polymer co-segregation and co-repulsion. In this model, the TAD upstream of the blue boundary (which contains CG33543, the odorant binding protein gene Obp22a and the Npc2a gene which encodes a protein involved in sterol homeostasis) would share the same chromatin state/biophysical properties as the TAD upstream of the purple boundary, which has the fipi gene. While it is true that CG33543, Obp22a and also the fipi gene are not expressed in embryos, Npc2a is expressed at high levels during embryogenesis, yet it is part of the TAD that interacts with the fipi TAD.  The TAD downstream of the blue boundary contains CG15353 and Nplp4 and it interacts with the TAD downstream of the purple boundary which contains CG3294 and slfCG15353 and Nplp4 are not expressed in the embryo and as such should share a compartment with a TAD that is also silent. However, slf is expressed at a high level in 1216 hr embryos, while CG3294 is expressed at a low level.  In neither case would one conclude that the TADs upstream and downstream of the blue and purple boundaries, respectively, interact because of shared chromatin/biophysical states that drive block polymer co-segregation corepulsion. 

      One might also consider several gedanken experiments involving the long-range interactions that generate the CG3294 meta-loop in Author response image 3.    According to the micro-compartment model the patchwork pattern of crosslinking evident in the CG3294 meta-loop arises because the interacting  TADs share the same biochemical/biophysical properties, and this drives block polymer cosegregation and co-repulsion.  If this model is correct, then this patchwork pattern of TAD:TAD interactions would remain unchanged if we were to delete the blue or the purple boundary.  However, given what we know about how boundaries can find and pair with distant boundaries (c.f., Figure 6 from Muller et el. 1999 and the discussion in #1.2), the result of these gedanken experiments seem clear: the patchwork pattern shown in Author response image 3A will disappear.  What would happen if we inverted the blue or the purple boundary? Would the TAD containing CG33543, Obp22a and Npc2a still interact with fipi as would be expected from the compartment model?  Or would the pattern of interactions flip so that the CG33543, Obp22a and Npc2a TAD interacts with the TAD containing CG3294 and slf?  Again we can anticipate the results based on previous studies: the interacting TADs will switch when the CG3294 meta-loop is converted into a stem-loop.  If this happened, the only explanation possible in the compartment model is that the chromatin states change when the boundary is inverted so that TAD upstream of blue boundary now shares the same chromatin state as the TAD downstream of the purple boundary, while the TAD downstream of the blue boundary shares same state as the TAD upstream of the purple boundary.  However, there is no evidence that boundary orientation per se can induce a complete switch in “chromatin states” as would be required in the compartment model. 

      While we have not done these experimental manipulations with the CG3294 meta-loop, an equivalent experiment was done in Bing et al. (Bing et al. 2024).  However, instead of deleting a boundary element, we inserted a homie boundary element together with two reporters (gfp and LacZ) 142 kb away from the eve TAD.  The result of this gedanken “reverse boundary deletion” experiment is shown in Author response image 5.  Panel A shows the MicroC contact profile in the region spanning the transgene insertion site and the eve TAD in wild type (read “deletion”) NC14 embryos.  Panel B shows the MicroC contact profile from 12-16 hr embryos carrying the homie dual reporter transgene inserted at -142 kb.  Prior to the “deletion”, the homie element in the transgene pairs with nhomie and homie in the eve TAD and this generates a “mini-metaloop.”  In this particular insert, the homie boundary in the transgene (red arrow) is “pointing” in the opposite orientation from the homie boundary in the eve TAD (red arrow).  In this orientation, the pairing of the transgene homie with eve nhomie/homie brings the LacZ reporter into contact with sequences in the eve TAD.  Since a mini-metaloop is formed by homie_à _nhomie/homie pairing, sequences in TADs upstream and downstream of the transgene insert interact with sequences in TADs close to the eve TAD (Author response image 5B).  Taken together these interactions correspond to the interaction patchwork that is typically seen in “compartments” (see boxed region and inset).  If this patchwork is driven as per the model, by block polymer co-segregation and co-repulsion, then it should still be present when the transgene is deleted.  However, panel A shows that the interactions linking the transgene and the sequences in TADs next to the transgene to eve and TADs next to eve disappear when the homie boundary (plus transgene) is “deleted” in wild type flies.

      Author response image 5.

      Boundary deletion and compartments

      A second experiment would be to invert the homie boundary so that instead of pointing away from eve it points towards eve.  Again, if the compartmental patchwork is driven by block polymer co-segregation and co-repulsion, inverting the homie boundary in the transgene should have no effect on the compartmental contact profile.  Inspection of Fig. 7 in Bing et al. (Bing et al. 2024) will show that this prediction doesn’t hold either.  When homie is inverted, sequences in the eve TAD interact with the gfp reporter not the LacZ reporter.  In addition, there are corresponding changes in how sequences in TADs to either side of eve interact with sequences to either side of the transgene insert.  

      Yet another “test” of compartments generated by block polymer co-segregation/co-repulsion is provided by the plume above the eve volcano triangle.  According to the compartment model, sequences in TADs flanking the eve locus form the plume above the eve volcano triangle because their chromatin shares properties that drive block polymer co-segregation.  These same properties result in repulsive interactions with chromatin in the eve TAD, and this would explain why the eve TAD doesn’t crosslink with its neighbors.  If the distinctive chromatin properties of eve and the neighboring TADs drive block polymer co-segregation and co-repulsion, then inverting the nhomie boundary or introducing homie in the forward orientation should have absolutely no effect on the physical interactions between chromatin in the eve TAD and chromatin in the neighboring TADs.  However, Figures 4 and 6 in this paper indicate that boundary pairing orientation, not block polymer co-segregation/co-repulsion, is responsible for forming the plume above the eve TAD. Other findings also appear to be inconsistent with the compartment model. (A) The plume topping the eve volcano triangle is present in NC14 embryos when eve is broadly expressed (and potentially active throughout the embryo).  It is also present in 12-16 hr embryos when eve is only expressed in a very small subset of cells and is subject to PcG silencing everywhere else in the embryo.  B) According to the compartment model the precise patchwork pattern of physical interactions should depend upon the transcriptional program/chromatin state that is characteristic of a particular developmental stage or cell type.  As cell fate decisions are just being made during NC14 one might expect that most nuclei will share similar chromatin states throughout much of the genome.  This would not be true for 12-16 hr embryos.  At this stage the compartmental patchwork would be generated by a complex mixture of interactions in cells that have quite different transcriptional programs and chromatin states.  In this case, the patchwork pattern would be expected to become fuzzy as a given chromosomal segment would be in compartment A in one group of cells and in compartment B in another.   Unlike 12-16 hr embryos,  larval wing discs would be much more homogeneous and likely give a distinct and relatively well resolved compartmental pattern. We’ve examined the compartment patchwork of the same chromosomal segments in NC14 embryos, 12-16 hr embryos and larval wing disc cells.  While there are some differences (e.g., changes in some of the BX-C TADs in the wing disc sample) the compartmental patchwork patterns are surprisingly similar in all three cases. Nor is there any “fuzziness” in the compartmental patterns evident in 12-16 hr embryos, despite the fact that there are many different cell types at this stage of development.  C) TAD interactions with their neighbors and compartmental patchworks are substantially suppressed in salivary gland polytene chromosomes.  This would suggest that features of chromosome structure might be the driving force behind many of the “compartmental” interactions as opposed to distinct biochemical/biophysical of properties of small chromosomal segments that drive polymer co- segregation/co-repulsion.  

      (3) The contact maps presented in the study represent many cells and distinct cell types. It is clear from single-cell Hi-C and multiplexed FISH experiments that chromosome conformation is highly variable even within populations of the same cell, let alone between cell types, with structures such as TADs being entirely absent at the single cell level and only appearing upon pseudobulking. It is difficult to square these observations with the models of relatively static structures depicted here. The authors should provide commentary on this point.

      (2.5) As should be evident from Author response image 1, single-cell Hi-C experiments would not provide useful information about the physical organization of individual TADs, TAD boundaries or how individual TADs interact with their immediate neighbors.  In addition, since they capture only a very small fraction of the possible contacts within and between TADs, we suspect that these single-cell studies aren’t likely to be useful for making solid conclusions about TAD neighborhoods like those shown in Author response image 1 panels A, B, C and D, or Author response image 2.  While it might be possible to discern relatively stable contacts between pairs of insulators in single cells with the right experimental protocol, the stabilities/dynamics of these interactions may be better judged by the length of time that physical interactions are seen to persist in live imaging studies such as Chen et al. (2018), Vazquez et al. (2006) and Li et al. (2011).

      The in situ FISH data we’ve seen also seems problematic in that probe hybridization results in a significant decondensation of chromatin.  For two probe sets complementary to adjacent ~1.2 kb DNA sequences, the measured center-to-center distance that we’ve seen was ~110 nM.  This is about 1/3rd the length that is expected for a 1.2 kb naked DNA fragment, and about 1.7 times larger than that expected for a beads-on-a-string nucleosome array (~60 nM).  However, chromatin is thought to be compacted into a 30 nM fiber, which is estimated to reduce the length of DNA by at least another ~6 fold.  If this estimate is correct, FISH hybridization would appear to result in a ~10 fold decompaction of chromatin.  A decompaction of this magnitude would necessarily be followed by a significant distortion in the actual conformation of chromatin loops.

      (4) The analysis of the Micro-C data appears to be largely qualitative. Key information about the number of reads sequenced, reaps mapped, and data quality are not presented. No quantitative framework for identifying features such as the "plumes" is described. The study and its findings would be strengthened by a more rigorous analysis of these rich datasets, including the use of systematic thresholds for calling patterns of organization in the data.

      Additional information on the number of reads and data quality have been included in the methods section. 

      (5) Related to Point 4, the lack of quantitative details about the Micro-C data make it difficult to evaluate if the changes observed are due to biological or technical factors. It is essential that the authors provide quantitative means of controlling for factors like sampling depth, normalization, and data quality between the samples.

      In our view the changes in the MicroC contact patterns for the eve locus and its neighbors when the nhomie boundary is manipulated are not only clear cut and unambiguous but are also readily evident in the Figs that are presented in the manuscript.  If the reviewer believes that there aren’t significant differences between the MicroC contact patterns for the four different nhomie replacements, it seems certain that they would also remain unconvinced by a quantitative analysis.

      The reviewer also suggests that biological and/or technical differences between the four samples could account for the observed changes in the MicroC patterns for the eve TAD and its neighbors.  If this were the case, then similar changes in MicroC patterns should be observed elsewhere in the genome.  Since much of the genome is analyzed in these MicroC experiments there is an abundance of internal controls for each experimental manipulation of the nhomie boundary.  For two of the nhomie replacements, nhomie reverse and homie forward, the plume above the eve volcano triangle is replaced by clouds surrounding the eve volcano triangle.  If these changes in the eve MicroC contact patterns are due to significant technical (or biological) factors, we should observe precisely the same sorts of changes in TADs elsewhere in the genome that are volcano triangles with plumes.   Author response image 6 shows the MicroC contact pattern for several genes in the Antennapedia complex.  The deformed gene is included in a TAD which, like eve, is a volcano triangle topped by a plume.  A comparison of the deformed MicroC contact patterns for nhomie forward (panel B) with the MicroC patterns for nhomie reverse (panel C) and homie forward (panel D) indicates that while there are clearly technical differences between the samples, these differences do not result in the conversion of the deformed plume into clouds as is observed for the eve TAD.  The MicroC patterns elsewhere in Antennapedia complex are also very similar in all four samples.  Likewise, comparisons of regions elsewhere in the fly genome indicate that the basic contact patterns are similar in all four samples.   So while there are technical differences which are reflected in the relative pixel density in the TAD triangles and the LDC domains, these differences do not result in converting plumes into clouds nor do the alter the basic patterns of TAD triangles and LDC domains.  As for biological differences— the embryos in each sample are at roughly the same developmental stage and were collected and processed using the same procedures. Thus, the biological factors that could reasonably be expected to impact the organization of specific TADs (e.g., cell type specific differences) are not going to impact the patterns we see in our experiments. 

      Author response image 6.

      (6) The ISH effects reported are modest, especially in the case of the HCR. The details provided for how the imaging data were acquired and analyzed are minimal, which makes evaluating them

      challenging. It would strengthen the study to provide much more detail about the acquisition and analysis and to include depiction of intermediates in the analysis process, e.g. the showing segmentation of stripes.

      The imaging analysis is presented in Fig. 5 is just standard confocal microscopy.  Individual embryos were visualized and scored.  An embryo in which stripes could be readily detected was scored as ‘positive’ while an embryo in which stripes couldn’t be detected was scored as ‘negative.’   

      Recommendations for the authors:

      Editor comments:

      It was noted that the Jaynes lab previously published extensive genetic evidence to support the stem loop and circle loop models of Homie-Nhomie interactions (Fujioka 2016 Plos Genetics) that were more convincing than the Micro-C data presented here in proof of their prior model. Maybe the authors could more clearly summarize their prior genetic results to further try to convince the reader about the validity of their model.

      Reviewer #1 (Recommendations For The Authors):

      Below, I list specific comments to further improve the manuscript for publication. Most importantly, I recommend the authors tone down their proposal that boundary pairing is a universal TAD forming mechanism.

      (1) The title is cryptic.

      (2) The second sentence in the abstract is an overstatement: "In flies, TADs are formed by physical interactions between neighboring boundaries". Hi-C and Micro-C studies have not provided evidence that most TADs in Drosophila show focal interactions between their bracketing boundaries. The authors rely too strongly on prior studies that used artificial reporter transgenes to show that multimerized insulator protein binding sites or some endogenous fly boundaries can mediate boundary bypass, as evidence that endogenous boundaries pair.

      Please see responses #1.1 and #1.3 and figures Author response image 1 and Author response image 3.  Note that using dHS-C, most TADs that we’ve looked at so far are topped by a “dot” at their apex.

      (3) Line 64: the references do not cite the stated "studies dating back to the '90's'".

      The papers cited for that sentence are reviews which discussed the earlier findings.  The relevant publications are cited at the appropriate places in the same paragraph.  

      (4) Line 93: "On the other hand, while boundaries have partner preferences, they are also promiscuous in their ability to establish functional interactions with other boundaries." It was unclear what is meant here.

      Boundaries that a) share binding sites for proteins that multimerized, b) have binding sites for proteins that interact with each other, or c) have binding sites for proteins that can be bridged by a third protein can potentially pair with each other.  However, while these mechanisms enable promiscuous pairing interactions, they will also generate partner preferences (through a greater number of a, b and/or c).

      (5) It could be interesting to discuss the fact that it remains unclear whether Nhomie and Homie pair in cis or in trans, given that homologous chromosomes are paired in Drosophila.

      The studies in Fujioka et al. (Fujioka et al. 2016) show that nhomie and homie can pair both in cis and in trans.  Given the results described in #1.2, we imagine that they are paired in both cis and trans in our experiments.

      (6) Line 321: Could the authors further explain why they think that "the nhomie reverse circle-loop also differs from the nhomie deletion (λ DNA) in that there is not such an obvious preference for which eve enhancers activate expression"?

      The likely explanation is that the topology/folding of the altered TADs impacts the probability of interactions between the various eve enhancers and the promoters of the flanking genes.  

      (7) The manuscript would benefit from shortening the long Discussion by avoiding repeating points described previously in the Results.

      (8) Line 495: "If, as seems likely, a significant fraction of the TADs genome-wide are circle loops, this would effectively exclude cohesin-based loop extrusion as a general mechanism for TAD formation in flies". The evidence provided in this manuscript appears insufficient to discard ample evidence from multiple laboratories that TADs form by compartmentalization or loop extrusion. Multiple laboratories have, for example, demonstrated that cohesin depletion disrupts a large fraction of mammalian TADs. 

      Points made here and in #9 have been responded to in #1.1, #2.1 and #2.4 above.  We would suggest that the evidence for loop extrusion falls short of compelling (as it is based on the analysis of TAD neighborhoods, not TADs—that is forests, not trees) and given the results reported in Goel et al. (in particular Fig. 4 and Sup Fig. 8) is clearly suspect. This is not to mention the fact that cohesin loop-extrusion can’t generate circle-loops TADs, yet circle-loops clearly exist.  Likewise, as discussed in #2.4, it is not clear to us that the shared chromatin states, polymer co-segregation and co-repulsion account for the compartmental patchwork patterns of TAD;TAD interactions. The results from the  experimental manipulations in this paper and the accompanying paper, together with studies by others (e.g., Kyrchanova et al. (Kyrchanova et al. 2008), Mohana et al. (Mohana et al. 2023) would also seem to be at odds with the model for compartments as currently formulated.  

      The unique properties of Nhomie and Homie, namely the remarkable specificity with which they physically pair over large distances (Fujioka et al. 2016) may rather suggest that boundary pairing is a phenomenon restricted to special loci. Moreover, it has not yet been demonstrated that Nhomie or Homie are also able to pair with the TAD boundaries on their left or right, respectively.

      Points made here were discussed in detail in #1.2.  As described in detail in #1.2, It is not the case that nhomie and homie are in “unique” or “special.”  Other fly boundaries can do the same things.  As for whether nhomie and homie pair with their neighbors:  We haven’t done transgene experiments (e.g., testing by transvection or boundary bypass).  Likewise, in MicroC experiments there are no obvious dots at the apex of the neighboring TADs that would correspond to nhomie pairing with the neighboring boundary to the left and homie pairing with the neighboring boundary to the right. However, this is to be expected. As we discussed in in #1.3 above, only MNase resistant elements will generate dots in standard MicroC experiments.  On the other hand, when boundary:boundary interactions are analyzed by dHS-C (c.f., Author response image 4), there are dots at the apex of both neighboring TADs.  This would be direct evidence that nhomie pairs with the neighboring boundary to the left and homie pairs with the neighboring boundary to the right.

      (9) The comment in point 8 also applies to the concluding 2 sentences (lines 519-524) of the Discussion.

      See response to 8 above. Otherwise, the concluding sentences are completely accurate. Validation of the cohesin loop extrusion/CTCF roadblock model will required demonstrating a) that all TADs are either stem-loops or unanchored loops and b) that TAD endpoints are always marked by CTCF. 

      The likely presence of circle-loops and evidence that TAD boundaries that don’t have CTCF (c.f.,Goel et al. 2023) already suggests that this model can’t (either fully or not all) account for TAD formation in mammals. 

      (10) Figs. 3 and 6: It would be helpful to add the WT screenshot in the same figure, for direct comparison.

      It is easy enough to scroll between Figs-especially since nhomie forward looks just like WT.

      (11) Fig. 6: It would be helpful to show a cartoon view of a circle loop to the right of the Micro-C screenshot, as was done in Fig. 3.

      Good idea.   Added to the Fig.

      (12) Fig. 5: It would be helpful to standardize the labelling of the different genotypes throughout the figures and panels ("inverted" versus "reverse" versus an arrow indicating the direction).

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      Minor Points:

      (1) The Micro-C data does not appear to be deposited in an appropriate repository. It would be beneficial to the community to make these data available in this way.

      This has been done.

      (2) Readers not familiar with Drosophila development would benefit from a gentle introduction to the stages analyzed and some brief discussion on how the phenomenon of somatic homolog pairing might influence the study, if at all.

      We included a rough description the stages that were analyzed for both the in situs and MicroC. We thought that an actual description of what is going on at each of the stages wasn’t necessary as the process of development is not a focus of this manuscript.  In other studies, we’ve found that there are only minor differences in MicroC patterns between the blastoderm stage and stage 12-16 embryos.  While these minor differences are clearly interesting, we didn’t discuss them in the text.   In all of experiments chromosomes are likely to be paired.  In NC14 embryos (the stage for visualizing eve stripes and the MicroC contact profiles in Fig. 2) replication of euchromatic sequences is thought to be quite rapid.  While homolog pairing is incomplete at this stage, sister chromosomes are paired.  In stage 12-16 embryos, homologs will be paired and if the cells are arrested in G2, then sister chromosome will also be paired.  So in all of experiments, chromosomes (sisters and/or homologs) are paired. However, since we don’t have examples of unpaired chromosomes, our experiments don’t provide any info on how chromosome pairing might impact MicroC/expression patterns.

      (3) "P > 0.01" appears several times. I believe the authors mean to report "P < 0.01".

      Fixed.  

      References for Response

      Batut PJ, Bing XY, Sisco Z, Raimundo J, Levo M, Levine MS. 2022. Genome organization controls transcriptional dynamics during development. Science. 375(6580):566-570.

      Bing X, Ke W, Fujioka M, Kurbidaeva A, Levitt S, Levine M, Schedl P, Jaynes JB. 2024. Chromosome structure i: Loop extrusion or boundary:Boundary pairing? eLife.

      Blanton J, Gaszner M, Schedl P. 2003. Protein:Protein interactions and the pairing of boundary elements in vivo. Genes Dev. 17(5):664-675.

      Bonchuk A, Boyko K, Fedotova A, Nikolaeva A, Lushchekina S, Khrustaleva A, Popov V, Georgiev P. 2021. Structural basis of diversity and homodimerization specificity of zincfinger-associated domains in drosophila. Nucleic Acids Res. 49(4):2375-2389.

      Bonchuk A, Kamalyan S, Mariasina S, Boyko K, Popov V, Maksimenko O, Georgiev P. 2020. Nterminal domain of the architectural protein ctcf has similar structural organization and ability to self-association in bilaterian organisms. Sci Rep. 10(1):2677.

      Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, Gregor T. 2018. Dynamic interplay between enhancer–promoter topology and gene activity. Nat Genet. 50(9):1296.

      Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. 2017. C2h2 zinc finger proteins: The largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae. 9(2):47-58.

      Fujioka M, Ke W, Schedl P, Jaynes JB. 2024. The homie insulator has sub-elements with different insulating and long-range pairing properties. bioRxiv. 2024.02.01.578481.

      Fujioka M, Mistry H, Schedl P, Jaynes JB. 2016. Determinants of chromosome architecture: Insulator pairing in cis and in trans. PLoS Genet. 12(2):e1005889.

      Galloni M, Gyurkovics H, Schedl P, Karch F. 1993. The bluetail transposon: Evidence for independent cis‐regulatory domains and domain boundaries in the bithorax complex. The EMBO Journal. 12(3):1087-1097.

      Gaszner M, Vazquez J, Schedl P. 1999. The zw5 protein, a component of the scs chromatin domain boundary, is able to block enhancer-promoter interaction. Genes Dev. 13(16):2098-2107.

      Goel VY, Huseyin MK, Hansen AS. 2023. Region capture micro-c reveals coalescence of enhancers and promoters into nested microcompartments. Nat Genet. 55(6):1048-1056.

      Harris HL, Gu H, Olshansky M, Wang A, Farabella I, Eliaz Y, Kalluchi A, Krishna A, Jacobs M, Cauer G et al. 2023. Chromatin alternates between a and b compartments at kilobase scale for subgenic organization. Nat Commun. 14(1):3303.

      Hart CM, Zhao K, Laemmli UK. 1997. The scs' boundary element: Characterization of boundary element-associated factors. Mol Cell Biol. 17(2):999-1009.

      Hsieh TS, Cattoglio C, Slobodyanyuk E, Hansen AS, Rando OJ, Tjian R, Darzacq X. 2020. Resolving the 3d landscape of transcription-linked mammalian chromatin folding. Mol Cell. 78(3):539-553.e538.

      Krietenstein N, Abraham S, Venev SV, Abdennur N, Gibcus J, Hsieh TS, Parsi KM, Yang L, Maehr R, Mirny LA et al. 2020. Ultrastructural details of mammalian chromosome architecture. Mol Cell. 78(3):554-565.e557.

      Kyrchanova O, Chetverina D, Maksimenko O, Kullyev A, Georgiev P. 2008. Orientation-dependent interaction between drosophila insulators is a property of this class of regulatory elements. Nucleic Acids Res. 36(22):7019-7028.

      Kyrchanova O, Ibragimov A, Postika N, Georgiev P, Schedl P. 2023. Boundary bypass activity in the abdominal-b region of the drosophila bithorax complex is position dependent and regulated. Open Biol. 13(8):230035.

      Kyrchanova O, Kurbidaeva A, Sabirov M, Postika N, Wolle D, Aoki T, Maksimenko O, Mogila V, Schedl P, Georgiev P. 2018. The bithorax complex iab-7 polycomb response element has a novel role in the functioning of the fab-7 chromatin boundary. PLoS Genet. 14(8):e1007442. Kyrchanova O, Sabirov M, Mogila V, Kurbidaeva A, Postika N, Maksimenko O, Schedl P, Georgiev P. 2019a. Complete reconstitution of bypass and blocking functions in a minimal artificial fab-7 insulator from drosophila bithorax complex. Proceedings of the National Academy of Sciences.201907190.

      Kyrchanova O, Wolle D, Sabirov M, Kurbidaeva A, Aoki T, Maksimenko O, Kyrchanova M, Georgiev P, Schedl P. 2019b. Distinct elements confer the blocking and bypass functions of the bithorax fab-8 boundary. Genetics.genetics. 302694.302019.

      Kyrchanova O, Zolotarev N, Mogila V, Maksimenko O, Schedl P, Georgiev P. 2017. Architectural protein pita cooperates with dctcf in organization of functional boundaries in bithorax complex. Development. 144(14):2663-2672.

      Li H-B, Muller M, Bahechar IA, Kyrchanova O, Ohno K, Georgiev P, Pirrotta V. 2011. Insulators, not polycomb response elements, are required for long-range interactions between polycomb targets in drosophila melanogaster. Mol Cell Biol. 31(4):616-625.

      Li X, Tang X, Bing X, Catalano C, Li T, Dolsten G, Wu C, Levine M. 2023. Gaga-associated factor fosters loop formation in the drosophila genome. Mol Cell. 83(9):1519-1526.e1514.

      Mohana G, Dorier J, Li X, Mouginot M, Smith RC, Malek H, Leleu M, Rodriguez D, Khadka J, Rosa P et al. 2023. Chromosome-level organization of the regulatory genome in the drosophila nervous system. Cell. 186(18):3826-3844.e3826.

      Muller M, Hagstrom K, Gyurkovics H, Pirrotta V, Schedl P. 1999. The mcp element from the drosophila melanogaster bithorax complex mediates long-distance regulatory interactions. Genetics. 153(3):1333-1356.

      Muravyova E, Golovnin A, Gracheva E, Parshikov A, Belenkaya T, Pirrotta V, Georgiev P. 2001. Loss of insulator activity by paired su(hw) chromatin insulators. Science. 291(5503):495498.

      Postika N, Metzler M, Affolter M, Müller M, Schedl P, Georgiev P, Kyrchanova O. 2018. Boundaries mediate long-distance interactions between enhancers and promoters in the drosophila bithorax complex. PLoS Genet. 14(12):e1007702.

      Samal B, Worcel A, Louis C, Schedl P. 1981. Chromatin structure of the histone genes of d. Melanogaster. Cell. 23(2):401-409.

      Sigrist CJ, Pirrotta V. 1997. Chromatin insulator elements block the silencing of a target gene by the drosophila polycomb response element (pre) but allow trans interactions between pres on different chromosomes. Genetics. 147(1):209-221.

      Udvardy A, Schedl P. 1984. Chromatin organization of the 87a7 heat shock locus of drosophila melanogaster. J Mol Biol. 172(4):385-403.

      Vazquez J, Muller M, Pirrotta V, Sedat JW. 2006. The mcp element mediates stable long-range chromosome-chromosome interactions in drosophila. Molecular Biology of the Cell. 17(5):2158-2165.

      Zolotarev N, Fedotova A, Kyrchanova O, Bonchuk A, Penin AA, Lando AS, Eliseeva IA, Kulakovskiy IV, Maksimenko O, Georgiev P. 2016. Architectural proteins pita, zw5,and zipic contain homodimerization domain and support specific long-range interactions in drosophila. Nucleic Acids Res. 44(15):7228-7241.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the reviewer for their careful reading of our manuscript and have taken all of their grammatical corrections into account.

      Reviewer #2 (Public Review):

      Weaknesses: 

      The paper contains multiple instances of non-scientific language, as indicated below. It would also benefit from additional details on the cryo-EM structure determination in the Methods and inclusion of commonly accepted requirements for cryo-EM structures, like examples of 2D class averages, raw micrographs, and FSC curves (between half-maps as well as between rigid-body fitted (or refined) atomic models of the different polymorphs and their corresponding maps). In addition, cryo-EM maps for the control experiments F1 and F2 should be presented in Figure 9.

      We tried to correct the non-scientific language and have included the suggested data on the Cryo-EM analyses including new Figures 11-17.  We did not collect data on the sample used for the seeds in the cross seeding experiments because we had already confirmed in multiple datasets that the conditions in F1 and F2 reproducibly produce fibrils of Type 1 and Type 3, respectively. We have now analyzed cryo-EM data for 6 more samples at pH 7.0 and found that several kinds of polymorphs (Types 1A, 1M, 2A, 2B and 5) are accessible at this pH, however the Type 3 polymorphs are not formed at pH 7.0 under the conditions that we used for aggregation.

      Reviewer #2 (Recommendations For The Authors):

      Remove unscientific language: "it seems that there are about as many unique atomicresolution structures of these aggregates as there are publications describing them"   

      We have rephrased this sentence.

      For same reason, remove "Obviously, " 

      Done

      What does this mean? “polymorph-unspecific” 

      Rephrased as non-polymorph-specific

      What does this mean? "shallow amyloid energy hypersurface"  

      By “shallow hypersurface” we mean that the minimum of the multi-dimensional function that describes the energy of the amyloid is not so deep that subtle changes to the environment will not favor another fold/energy minimum. We have left the sentence because while it may not be perfect, it is concise and seems to get the point across.

      "The results also confirm the possibility of producing disease-relevant structure in vitro." -> This is incorrect as no disease-relevant structure was replicated in this work. Use another word like “suggest”.

      We have changed to “suggest” as suggested.

      Remove "historically" 

      Done

      Rephrase “It has long been understood that all amyloids contain a common structural scaffold” 

      Changed to “It has long been established that all amyloids contain a common structural scaffold..” 

      "Amyloid polymorphs whose differences lie in both their tertiary structure (the arrangement of the beta-strands) and the quaternary structure (protofilamentprotofilament assembly) have been found to display distinct biological activities [8]" -> I don't think this is true, different biological activities of amyloids have never been linked to their distinct structures.  

      We have added 5 new references (8-12) to support this sentence.

      Reference 10 is a comment on reference 9; it should be removed. Instead, as for alphasynuclein, all papers describing the tau structures should be included.  

      We have removed the reference, but feel that the addition of all Tau structure references is not merited in this manuscript since we are not comparing them.

      Rephrase: "is not always 100% faithful"

      Removed “100%”

      What is pseudo-C2 symmetry? Do the authors mean pseudo 2_1 symmetry (ie a 2-start helical symmetry)?

      Thank for pointing this out.  We did indeed mean pseudo 21 helical symmetry.  

      Re-phrase: "alpha-Syn's chameleon-like behavior" 

      We have removed this phrase.

      "In the case of alpha-Syn, the secondary nucleation mechanism is based on the interaction of the positively charged N-terminal region of monomeric alpha-Syn and the disordered, negatively charged C-terminal region of the alpha-Syn amyloid fibrils [54]" -> I would say the mechanisms of secondary nucleation are not that well understood yet, so one may want to tune this down a bit. 

      We have changed this to “mechanism has been proposed to be”

      The paragraphs describing experiments by others are better suited for a Discussion rather than a Results section. Perhaps re-organize this part? 

      We have left the text intact as we are using a Results and Discussion format.

      A lot of information about Image processing seems to be missing: what steps were performed after initial model generation? 

      We have added more details in the methods section on the EM data processing and model analysis.

      Figure 1: Where is Type 4 on the pH scale?

      We have adjusted the Fig 1 legend to clarify that pH scale is only applicable to the structures presented in this manuscript. 

      Figure 2: This might be better incorporated as a subpanel of Figure 1.

      We agree that this figure is somewhat of a loner on its own and we only added it in order to avoid confusion with the somewhat inconsistent naming scheme used for the Type 1B structure. However, we prefer to leave it as a separate figure so that it does not get dilute the impact of figure 1.

      Figure 3: What is the extra density at the bottom of Type 3B from pH 5.8 samples 1 and 2. pH 5.8 + 50mM NaCl (but not pH 5.8 + 100 mM NaCl)? Could this be an indication of a local minimum and the pH 5.8 + 100 mM NaCl structure is correct? Or is this a real difference between 0/50mM NaCl and 100 mM NaCl? 

      We did not see the extra density to which the reviewer is referring, however the images used in this panel are the based on the output of 3D-classification which is more likely to produce more artifacts than a 3D refinement. With this in mind, we did not see any significant differences in the refined structures and therefore only deposited the better quality map and model for each of the polymorph types.

      Figure 3: To what extent is Type 3B of pH 6.5 still a mixture of different types? The density looks poor. In general, in the absence of more details about the cryo-EM maps, it is hard to assess the quality of the structures presented.

      In order to improve the quality of the images in this panel, a more complete separation of the particles from each polymorph was achieved via the filament subset selection tool in RELION 5. In each case, an unbiased could be created from the 2D classes via the relion_helix_inimodel2D program, further supporting the coexistence of 4 polymorphs in the pH 6.5 sample. The particles were individually refined to produce the respective maps that are now used in this figure.

      Many references are incorrect, containing "Preprint at (20xx)" statements.  

      This has been corrected.

      Reviewer #3 (Public Review):

      Weaknesses: 

      (1) The authors reveal that both Type 1 monofilament fibril polymorph (reminiscent of JOSlike polymorph) and Type 5 polymorph (akin to tissue-amplified-like polymorph) can both form under the same condition. Additionally, this condition also fosters the formation of flat ribbon-like fibril across different batches. Notably, at pH 5.8, variations in experimental groups yield disparate abundance ratios between polymorph 3B and 3C, indicating a degree of instability in fibrillar formation. The variability would potentially pose challenges for replicability in subsequent research. In light of these situations, I propose the following recommendations: 

      (a) An explicit elucidation of the factors contributing to these divergent outcomes under similar experimental conditions is warranted. This should include an exploration of whether variations in purified protein batches are contributing factors to the observed heterogeneity.

      We are in complete agreement that understanding the factors that lead to polymorph variability is of utmost importance (and was the impetus for the manuscript itself). However the number of variables to explore is overwhelming and we will continue to investigate this in our future research. Regarding the variability between batches of purified protein, we also think that this could be a factor in the polymorph variability observed for otherwise “identical” aggregation conditions, particularly at pH 7 where the largest variety of polymorphs have been observed. However, even variation between identical replicates (samples created from the same protein solution and simply aggregated simultaneously in separate tubes) can lead to different outcomes (see datasets 15 and 16 in the revised Table 1) suggesting that there are stochastic processes that can determine the outcome of an individual aggregation experiment. While our data still indicates that Type 1,2 and 3 polymorphs are strongly selected by pH, the selection between interface variants 3B vs. 3C and 2A vs. 2B might also be affected by protein purity. Our standard purification protocol produces a single band by coomassie-stained SDS-PAGE however minor truncations and other impurities below a few percent would go undetected and, given the proposed roles of the N and C-termini in secondary nucleation, could have a large effect on polymorph selection and seeding. In line with the reviewer’s comments we now include a batch number for each EM dataset. While no new conclusions can be drawn from the inclusion of this additional data, we feel that it is important to acknowledge the possible role of batch to batch variability. 

      (b) To enhance the robustness of the conclusions, additional replicates of the experiments under the same condition should be conducted, ideally a minimum of three times.  

      The pH 5.8 conditions that yield Type 3 fibrils has already been repeated several times in the original manuscript. Since the pH 7.4 conditions produce the most common a-Syn polymorph (Type 1A) and were produced twice in this manuscript (once as an unseeded and once as a cross-seeded fibrilization) we decided to focus on the intermediate condition where the most variability had been seen (pH 7.0). The revised table 1 now has 6 new datasets (11-16) representing 6 independent aggregations at pH 7.0 starting from two different protein purification batches. The results is that we now produce the type 2A/B polymorphs in three samples and in two of these samples we once again observed the type 1M polymorph.  The other samples produced Type 1A or non-twisted fibrils.

      (c) Further investigation into whether different polymorphs formed under the same buffer condition could lead to distinct toxicological and pathology effects would be a valuable addition to the study.  

      The correlation of toxicity with structure would in principle be interesting. However the Type 1 and Type 3 polymorphs formed at pH 5.8 and 7.4 are not likely to be biologically relevant. The pH 7 polymorphs (Type 5 and 1M) would be more interesting because they form under the same conditions and might be related to some disease relevant structures. Still, it is rare that a single polymorph appears at 7.0 (the Type 5 represented only 10-20% of the fibrils in the sample and the Type 1M also had unidentified double-filament fibrils in the sample). We plan to pursue this line of research and hope to include it in a future publication.

      (2) The cross-seeding study presented in the manuscript demonstrates the pivotal role of pH conditions in dictating conformation. However, an intriguing aspect that emerges is the potential role of seed concentration in determining the resultant product structure. This raises a critical question: at what specific seed concentration does the determining factor for polymorph selection shift from pH condition to seed concentration? A methodological robust approach to address this should be conducted through a series of experiments across a range of seed concentrations. Such an approach could delineate a clear boundary at which seed concentration begins to predominantly dictate the conformation, as opposed to pH conditions. Incorporating this aspect into the study would not only clarify the interplay between seed concentration and pH conditions, but also add a fascinating dimension to the understanding of polymorph selection mechanisms.

      A more complete analysis of the mechanisms of aggregation, including the effect of seed concentration and the resulting polymorph specificity of the process, are all very important for our understanding of the aggregation pathways of alphasynuclein and are currently the topic of ongoing investigations in our lab.

      Furthermore, the study prompts additional queries regarding the behavior of cross-seeding production under the same pH conditions when employing seeds of distinct conformation. Evidence from various studies, such as those involving E46K and G51D cross-seeding, suggests that seed structure plays a crucial role in dictating polymorph selection. A key question is whether these products consistently mirror the structure of their respective seeds. 

      We thank the reviewer for reminding us to cite these studies as a clear example of polymorph selection by cross-seeding. Unfortunately, it is not 100% clear from the G51D cross seeding manuscript (https://doi.org/10.1038/s41467-021-26433-2) what conditions were used in the cross-seeding since different conditions were used for the seedless wild-type and mutant aggregations… however it appears that the wildtype without seeds was Tris pH 7.5 (although at 37C the pH could have dropped to 7ish) and the cross-seeded wild-type was in Phosphate buffer at pH 7.0. In the E46K cross-seeding manuscript, it appears that pH 7.5 Tris was used for all fibrilizations (https://doi.org/10.1073/pnas.2012435118).  In any event, both results point to the fact that at pH 7.0-7.5 under low-seed conditions (0.5%) the Type 4 polymorph can propagate in a seed specific manner.

      (3) In the Results section of "The buffer environment can dictate polymorph during seeded nucleation", the authors reference previous cell biological and biochemical assays to support the polymorph-specific seeding of MSA and PD patients under the same buffer conditions. This discussion is juxtaposed with recent research that compares the in vivo biological activities of hPFF, ampLB as well as LB, particularly in terms of seeding activity and pathology. Notably, this research suggests that ampLB, rather than hPFF, can accurately model the key aspects of Lewy Body Diseases (LBD) (refer to: https://doi.org/10.1038/s41467-023-42705-5). The critical issue here is the need to reconcile the phenomena observed in vitro with those in in-vivo or in-cell models. Given the low seed concentration reported in these studies, it is imperative for the authors to provide a more detailed explanation as to why the possible similar conformation could lead to divergent pathologies, including differences in cell-type preference and seeding capability.  

      We thank the reviewer for bring this recent report to our attention. The findings that ampLB and hPFF have different PK digestion patterns and that only the former is able to model key aspects of Lewy Body disease are in support of the seed-specific nature of some types of alpha-synuclein aggregation.  We have added this to the discussion regarding the significant role that seed type and seed conditions likely play in polymorph selection.

      (4) In the Method section of "Image processing", the authors describe the helical reconstruction procedure, without mentioning much detail about the 3D reconstruction and refinement process. For the benefit of reproducibility and to facilitate a deeper understanding among readers, the authors should enrich this part to include more comprehensive information, akin to the level of detail found in similar studies (refer to:

      https://doi.org/10.1038/nature23002).

      As also suggested by reviewer #2, we have now added more comprehensive information on the 3D reconstruction and refinement process.

      (5) The abbreviation of amino acids should be unified. In the Results section "On the structural heterogeneity of Type 1 polymorphs", the amino acids are denoted using three-letter abbreviation. Conversely, in the same section under "On the structural heterogeneity of Type 2 and 3 structures", amino acids are abbreviated using the one-letter format. For clarity and consistency, it is essential that a standardized format for amino acid abbreviations be adopted throughout the manuscript.

      That makes perfect sense and had been corrected.

      Reviewing Editor:

      After discussion among the reviewers, it was decided that point 2 in Reviewer #3's Public Review (about the experiments with different concentrations of seeds) would probably lie outside the scope of a reasonable revision for this work. 

      We agree as stated above and will continue to work on this important point.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary and Strengths:

      The ability of Wolbachia to be transmitted horizontally during parasitoid wasp infections is supported by phylogenetic data here and elsewhere. Experimental analyses have shown evidence of wasp-to-wasp transmission during coinfection (eg Huigins et al), host to wasp transmission (eg Heath et al), and mechanical ('dirty needle') transmission from host to host (Ahmed et al). To my knowledge this manuscript provides the first experimental evidence of wasp to host transmission. Given the strong phylogenetic pattern of host-parasitoid Wolbachia sharing, this may be of general importance in explaining the distribution of Wolbachia across arthropods. This is of interest as Wolbachia is extremely common in the natural world and influences many aspects of host biology.

      Weaknesses:

      The first observation of the manuscript is that the Wolbachia strains in hosts are more closely related to those in their parasitoids. This has been reported on multiple occasions before, dating back to the late 1990s. The introduction cites five such papers (the observation is made in other studies too that could be cited) but then dismisses them by stating "However, without quantitative tests, this observation could simply reflect a bias in research focus." As these studies include carefully collected datasets that were analysed appropriately, I felt this claim of novelty was rather strong. It is unclear why downloading every sequence in GenBank avoids any perceived biases, when presumably the authors are reanalysing the data in these papers.

      Thank you for bringing this to our attention, and we will make the necessary amendments in our revised manuscript.

      I do not doubt the observation that host-parasitoid pairs tend to share related Wolbachia, as it is corroborated by other studies, the effect size is large, and the case study of whitefly is clearcut. It is also novel to do this analysis on such a large dataset. However, the statistical analysis used is incorrect as the observations are pseudo-replicated due to phylogenetic non-independence. When analysing comparative data like this it is essential to correct for the confounding effects of related species tending to be similar due to common ancestry. In this case, it is well-known that this is an issue as it is a repeated observation that related hosts are infected by related Wolbachia. However, the authors treat every pairwise combination of species (nearly a million pairs) as an independent observation. Addressing this issue is made more complex because there are both the host and symbiont trees to consider. The additional analysis in lines 123-124 (including shuffling species pairs) does not explicitly address this issue.

      We concur with your observation regarding the non-independence of the data due to phylogenetic relationships. While common phylogenetic correction methods are indeed not directly applicable to wsp distances between species pairs, we are investigating the potential of phylogenetic mixed models to address this issue. We hope to include a revised analysis using this approach in our revised manuscript.

      The sharing of Wolbachia between whitefly and their parasitoids is very striking, although this has been reported before (eg the authors recently published a paper entitled "Diversity and Phylogenetic Analyses Reveal Horizontal Transmission of Endosymbionts Between Whiteflies and Their Parasitoids"). In Lines 154-164 it is suggested that from the tree the direction of transfer between host and parasitoid can be inferred from the data. This is not obvious to me given the poor resolution of the tree due to low sequence divergence. There are established statistical approaches to test the direction of trait changes on a tree that could have been used (a common approach is to use the software BEAST).

      Thank you for your insightful comments regarding the transfer direction of Wolbachia between whiteflies and their parasitoids. We acknowledge the concern about the resolution of the phylogenetic tree and the inference of the direction of Wolbachia transmission based on the available data. We considered the high infection frequency and obligate nature of Wolbachia in En. formosa, which exhibits a 100% infection rate, as a strong indicator that recent transmission of Wolbachia in this clade likely occurred from En. formosa to B. tabaci. We appreciate your recommendation and will ensure that our conclusions are supported by a more statistically sound approach. As you suggested, we will employ the software BEAST to rigorously test the direction of transmission, and we will revise our statements accordingly.

      Reviewer #2 (Public Review):

      The paper by Yan et al. aims to provide evidence for horizontal transmission of the intracellular bacterial symbiont Wolbachia from parasitoid wasps to their whitefly hosts. In my opinion, the paper in its current form consists of major flaws.

      Weaknesses:

      The dogma in the field is that although horizontal transmission events of Wolbachia occur, in most systems they are so rare that the chances of observing them in the lab are very slim.

      For the idea of bacteria moving from a parasitoid to its host, the authors have rightfully cited the paper by Hughes, et al. (2001), which presents the main arguments against the possibility of documenting such transmissions. Thus, if the authors want to provide data that contradict the large volume of evidence showing the opposite, they should present a very strong case.

      In my opinion, the paper fails to provide such concrete evidence. Moreover, it seems the work presented does not meet the basic scientific standards.

      We are grateful for your critical perspective on our work. Nonetheless, we are confident in the credibility of our findings regarding the horizontal transmission of Wolbachia from En. formosa to B. tabaci. Our study has documented this phenomenon through phylogenetic tree analyses, and we have further substantiated our observations with rigorous experiments in both cages and petri dishes. The horizontal transfer of Wolbachia was confirmed via PCR, with the wsp sequences in B. tabaci showing complete concordance with those in En. formosa. Additionally, we utilized FISH, vertical transmission experiments, and phenotypic assays to demonstrate that the transferred Wolbachia could be vertically transmitted and induce significant fitness cost in B. tabaci. All experiments were conducted with strict negative controls and a sufficient number of replicates to ensure reliability, thereby meeting basic scientific standards. The collective evidence we present points to a definitive case of Wolbachia transmission from the parasitoid En. formosa to the whitefly B. tabaci.

      My main reservations are:

      • I think the distribution pattern of bacteria stained by the probes in the FISH pictures presented in Figure 4 looks very much like Portiera, the primary symbiont found in the bacterium of all whitefly species. In order to make a strong case, the authors need to include Portiera probes along with the Wolbachia ones.

      We are very grateful for your critical evaluation regarding the specificity of FISH in our study. We assure the reliability of our FISH results based on several reasons.

      1) We implemented rigorous negative controls which exhibited no detectable signal, thereby affirming the specificity of our hybridization. 2) The central region of the whitefly nymphs is a typical oviposition site for En. formosa. Post-parasitism, we observed FISH signals around the introduced parasitoid eggs, distinct from bacteriocyte cells which are rich in endosymbionts including Portiera (FIG 3e-f). This observation supports the high specificity of our FISH method. 3) In the G3 whiteflies, we detected the presence of Wolbachia in bacteriocytes in nymphs and at the posterior end of eggs in adult females (FIG 4). This distribution pattern aligns with previously reported localizations of Wolbachia in B. tabaci (Shi et al., 2016; Skaljac et al., 2013). Furthermore, the distribution of Wolbachia in the whiteflies does indeed exhibit some overlap with that of Portiera (Skaljac et al., 2013; Bing et al., 2014). 4) The primers used in our FISH assays have been widely cited (Heddi et al., 1999) and validated in studies on B. tabaci and other systems (Guo et al., 2018; Hegde et al., 2024; Krafsur et al., 2020; Rasgon et al., 2006; Uribe-Alvarez et al., 2019; Zhao et al., 2013). Taking all these points into consideration, we stand by the reliability of our FISH results.

      References:

      Bing XL, Xia WQ, Gui JD, Yan GH, Wang XW, Liu SS. 2014. Diversity and evolution of the Wolbachia endosymbionts of Bemisia (Hemiptera: Aleyrodidae) whiteflies. Ecol Evol, 4(13): 2714-37.

      Guo, Y, Hoffmann, AA, Xu, XQ, Zhang X, Huang HJ, Ju JF, Gong JT, Hong XY. 2018. Wolbachia-induced apoptosis associated with increased fecundity in Laodelphax striatellus (Hemiptera: Delphacidae). Insect Mol Biol, 27: 796-807.

      Heddi A, Grenier AM, Khatchadourian C, Charles H, Nardon P. 1999. Four intracellular genomes direct weevil biology: Nuclear, mitochondrial, principal endosymbiont, and Wolbachia. Proc Natl Acad Sci USA, 96: 6814-6819.

      Hegde S, Marriott AE, Pionnier N, Steven A, Bulman C, Gunderson E, et al. 2024. Combinations of the azaquinazoline anti-Wolbachia agent, AWZ1066S, with benzimidazole anthelmintics synergise to mediate sub-seven-day sterilising and curative efficacies in experimental models of filariasis. Front Microbiol, 15: 1346068.

      Krafsur AM, Ghosh A, Brelsfoard CL. 2020. Phenotypic response of Wolbachia pipientis in a cell-free medium. Microorganisms, 8: 1060.

      Rasgon JL, Gamston, CE, Ren X. 2006. Survival of Wolbachia pipientis in cell-free medium. Appl Environ Microbiol, 72: 6934-6937.

      Shi P, He Z, Li S, An X, Lv N, Ghanim M, Cuthbertson AGS, Ren SX, Qiu BL. 2016. Wolbachia has two different localization patterns in whitefly Bemisia tabaci AsiaII7 species. PLoS One, 11: e0162558.

      Skaljac M, Zanić K, Hrnčić S, Radonjić S, Perović T, Ghanim M. 2013. Diversity and localization of bacterial symbionts in three whitefly species (Hemiptera: Aleyrodidae) from the east coast of the Adriatic Sea. Bull Entomol Res, 103(1): 48-59.

      Uribe-Alvarez C, Chiquete-Félix N, Morales-García L, Bohórquez-Hernández A, Delgado-Buenrostro N L, Vaca L, et al. 2019. Wolbachia pipientis grows in Saccharomyces cerevisiae evoking early death of the host and deregulation of mitochondrial metabolism. MicrobiologyOpen, 8: e00675.

      Zhao DX, Zhang XF, Chen DS, Zhang YK, Hong XY, 2013. Wolbachia-host interactions: Host mating patterns affect Wolbachia density dynamics. PLoS One, 8: e66373.

      • If I understand the methods correctly, the phylogeny presented in Figure 2a is supposed to be based on a wide search for Wolbachia wsp gene done on the NCBI dataset (p. 348). However, when I checked the origin of some of the sequences used in the tree to show the similarity of Wolbachia between Bemisia tabaci and its parasitoids, I found that most of them were deposited by the authors themselves in the course of the current study (I could not find this mentioned in the text), or originated in a couple of papers that in my opinion should not have been published to begin with.

      We appreciate your meticulous examination of the sources for our sequence data. All the sequences included in our phylogenetic analysis were indeed downloaded from the NCBI database as of July 2023. The sequences used to illustrate the similarity of Wolbachia between B. tabaci and its parasitoids include those from our previously published study (Qi et al., 2019), which were sequenced from field samples. Additionally, some sequences were also obtained from other laboratories (Ahmed et al., 2009; Baldo et al., 2006; Van Meer et al., 1999). We acknowledge that in our prior research (Qi et al., 2019), the sequences were directly submitted to NCBI and, regrettably, we did not update the corresponding publication information after the article were published. It is not uncommon for sequences on NCBI, with some never being followed by a published paper (e.g., FJ710487- FJ710511 and JF426137-JF426149), or not having their associated publication details updated post-publication (for instance, sequences MH918776-MH918794 from Qi et al., 2019, and KF017873-KF017878 from Fattah-Hosseini et al., 2018). We recognize that this practice can lead to confusion and apologize for the oversight in our work.

      References:

      Ahmed MZ, Shatters RG, Ren, SX, Jin GH, Mandour NS, Qiu BL. 2009. Genetic distinctions among the Mediterranean and Chinese populations of Bemisia tabaci Q biotype and their endosymbiont Wolbachia populations. J Appl Entomol, 133: 733-741.

      Baldo L, Hotopp JCD, Jolley KA, Bordenstein SR, Biber SA, Choudhury RR, et al. 2006. Multilocus sequence typing system for the endosymbiont Wolbachia pipientis. Appl Environ Microbiol, 72: 7098-110.

      Fattah-Hosseini S, Karimi J, Allahyari H. 2014. Molecular characterization of Iranian Encarsia formosa Gahan populations with natural incidence of Wolbachia infection. J Entomol Res Soc, 20: 85–100.

      Qi LD, Sun JT, Hong XY, Li YX. 2019. Diversity and phylogenetic analyses reveal horizontal transmission of endosymbionts between whiteflies and their parasitoids. J Econ Entomol, 112(2): 894-905.

      Van Meer MM, Witteveldt J, Stouthamer R. 1999. Phylogeny of the arthropod endosymbiont Wolbachia based on the wsp gene. Insect Mol Biol, 8: 399-408.

      • The authors fail to discuss or even acknowledge a number of published studies that specifically show no horizontal transmission, such as the one claimed to be detected in the study presented.

      Thank you for bringing this to our attention. We will address and discuss the published studies that report no evidence of horizontal transmission, as you've highlighted, in the revised version of our manuscript.

      Reviewer #3 (Public Review):

      This is a very ordinary research paper. The horizontal of endosymbionts, including Wolbachia, Rickettsia etc. has been reported in detail in the last 10 years, and parasitoid vectored as well as plant vectored horizontal transmission is the mainstream of research. For example, Ahmed et al. 2013 PLoS One, 2015 PLoS Pathogens, Chiel et al. 2014 Enviromental Entomology, Ahmed et al. 2016 BMC Evolution Biology, Qi et al. 2019 JEE, Liu et al. 2023 Frontiers in Cellular and Infection Microbiology, all of these reported the parasitoid vectored horizontal transmission of endosymbiont. While Caspi-Fluger et al. 2012 Proc Roy Soc B, Chrostek et al. 2017 Frontiers in Microbiology, Li et al. 2017 ISME Journal, Li et al. 2017 FEMS, Shi et al. 2024 mBio, all of these reported the plant vectored horizontal transmission of endosymbiont. For the effects of endosymbiont on the biology of the host, Ahmed et al. 2015 PLoS Pathogens explained the effects in detail.

      Thank you very much for your insightful comments and for highlighting the relevant literature in the field of horizontal transmission of endosymbionts, including Wolbachia and Rickettsia. After careful consideration of the studies you have mentioned, we believe that our work presents significant novel contributions to the field. 1) Regarding the parasitoid-mediated horizontal transmission of Wolbachia, most of the cited articles, such as Ahmed et al. 2013 in PLoS One and Ahmed et al. 2016 in BMC Evolutionary Biology, propose hypotheses but do not provide definitive evidence. The transmission of Wolbachia within the whitefly cryptic species complex (Ahmed et al. 2013) or between moths and butterflies (Ahmed et al. 2016) could be mediated by parasitoids, plants, or other unknown pathways. 2) Chiel et al. (2014 in Environmental Entomology reported “no evidence for horizontal transmission of Wolbachia between and within trophic levels” in their study system. 3) The literature you mentioned about Rickettsia, rather than Wolbachia, indirectly reflects the relative scarcity of evidence for Wolbachia horizontal transmission. For example, the evidence for plant-mediated transmission of Wolbachia remains isolated, with Li et al. 2017 in The ISME Journal being one of the few reports supporting this mode of transmission. 4) While the effects of endosymbionts on their hosts are not the central focus of our study, the effects of transgenerational Wolbachia on whiteflies are primarily demonstrated to confirm the infection of Wolbachia into whiteflies. Furthermore, the effects we report of Wolbachia on whiteflies are notably different from those reported by Ahmed et al. 2015 in PLoS Pathogens, likely due to different whitefly species and Wolbachia strains. 6) More importantly, our study reveals a mechanism of parasitoid-mediated horizontal transmission of Wolbachia that is distinct from the mechanical transmission suggested by Ahmed et al. 2015 in PLoS Pathogens. Their study implies transmission primarily through host-feeding contamination, without the need for Wolbachia to infect the parasitoid, suggesting host-to-host transmission at the same trophic level. In contrast, our findings demonstrate transmission from parasitoids to hosts through unsuccessful parasitism, which represents cross-trophic level transmission. To our knowledge, this is the first experimental evidence that Wolbachia can be transmitted from parasitoids to hosts. We believe these clarifications and the novel insights provided by our research contribute valuable knowledge to the field.

      References:

      Ahmed MZ, De Barro PJ, Ren SX, Greeff JM, Qiu BL. 2013. Evidence for horizontal transmission of secondary endosymbionts in the Bemisia tabaci cryptic species complex. PLoS One, 8: e53084.

      Ahmed MZ, Li SJ, Xue X, Yin XJ, Ren SX, Jiggins FM, Greeff JM, Qiu BL. 2015. The intracellular bacterium Wolbachia uses parasitoid wasps as phoretic vectors for efficient horizontal transmission. PLoS Pathog, 10: e1004672.

      Ahmed MZ, Breinholt JW, Kawahara AY. 2016. Evidence for common horizontal transmission of Wolbachia among butterflies and moths. BMC Evol Biol, 16: 118. doi.org/10.1186/s12862-016-0660-x.

      Caspi-Fluger A, Inbar M, Mozes-Daube N, Katzir N, Portnoy V, Belausov E, Hunter MS, Zchori-Fein E. 2012. Horizontal transmission of the insect symbiont Rickettsia is plant-mediated. Proc Biol Sci, 279(1734): 1791-6.

      Chiel E, Kelly SE, Harris AM, Gebiola M, Li X, Zchori-Fein E, Hunter MS. 2014. Characteristics, phenotype, and transmission of Wolbachia in the sweet potato whitefly, Bemisia tabaci (Hemiptera: Aleyrodidae), and its parasitoid Eretmocerus sp. nr. emiratus (Hymenoptera: Aphelinidae). Environ Entomol, 43(2): 353-62.

      Chrostek E, Pelz-Stelinski K, Hurst GDD, Hughes GL. 2017. Horizontal transmission of intracellular insect symbionts via plants. Front Microbiol, 8: 2237.

      Li SJ, Ahmed MZ, Lv N, Shi PQ, Wang XM, Huang JL, Qiu BL. 2017. Plantmediated horizontal transmission of Wolbachia between whiteflies. ISME J, 11: 1019-1028.

      Li YH, Ahmed MZ, Li SJ, Lv N, Shi PQ, Chen XS, Qiu BL. 2017. Plant-mediated horizontal transmission of Rickettsia endosymbiont between different whitefly species. FEMS Microbiol Ecol, 93(12). doi: 10.1093/femsec/fix138.

      Liu Y, He ZQ, Wen Q, Peng J, Zhou YT, Mandour N, McKenzie CL, Ahmed MZ, Qiu BL. 2023. Parasitoid-mediated horizontal transmission of Rickettsia between whiteflies. Front Cell Infect Microbiol, 12: 1077494. DOI: 10.3389/fcimb.2022.1077494

      Qi LD, Sun JT, Hong XY, Li YX. 2019. Diversity and phylogenetic analyses reveal horizontal transmission of endosymbionts between whiteflies and their parasitoids. J Econ Entomol, 112: 894-905.

      Shi PQ, Wang L, Chen XY, Wang K, Wu QJ, Turlings TCJ, Zhang PJ, Qiu BL. 2024. Rickettsia transmission from whitefly to plants benefits herbivore insects but is detrimental to fungal and viral pathogens. mBio, 15(3): e0244823.

      Weaknesses:

      In the current study, the authors downloaded the MLST or wsp genes from a public database and analyzed the data using other methods, and I think the authors may not be familiar with the research progress in the field of insect symbiont transmission, and the current stage of this manuscript lacking sufficient novelty.

      We appreciate your critical perspective on our study. However, we respectfully disagree with the viewpoint that our manuscript lacks sufficient novelty.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      “EGFRvIII is mainly associated with the classical subtype, so the mesenchymal subtype might be unexpected here. This could be commented on.” 

      We acknowledge that EGFRvIII is most often associated with the classical subtype of glioblastoma and agree that mesenchymal subtype classification may be unexpected given the use of her4.1:EGFRvIII as a driver in our model. We would like to highlight the fact that our brain tumors do also express certain markers associated with the classical subtype including neural precursor and neural stem cell markers like sox2, ascl1b, and gli2 (Supplementary Fig 4, 5; Supplementary Table 1-3). However, our transcriptomic data was not found to significantly enrich for classical subtype gene expression, compared to normal brains. This could be due to a significant contribution of normal brain tissue to our analyses (bulk tumor burdened brains were harvested for RNA sequencing), as well as the significant contribution of mesenchymal subtype signatures and/or inflammatory gene expression in our brain tumor-positive samples. Because signatures associated with inflammation consist of some of the most highly upregulated genes in our samples, this could potentially dilute out and/or lessen alterative subtype and/or signature gene expression. Importantly, it is now widely appreciated that patient tumors simultaneously consist of heterogenous tumor cells reflecting multiple molecular subtypes (Couturier et al., 2020; Darmanis et al., 2017; Neftel et al., 2019), providing glioblastoma with a high level of phenotypic plasticity. We also demonstrate that the contribution of additional drivers not always present with EGFRvIII in patient glioblastoma enhances primary brain tumors in vivo. This result is consistent with more aggressive glioblastomas seen in patients with EGFRvIII variants and TP53 loss-of-function mutations (Ruano et al., 2009). It will therefore be interesting in the future to consider how single or multiple driver mutations contribute to subtype-specific gene expression in our model, as well as histopathology, relative to patients. We have included some of these discussion points to our revised manuscript.     

      “Some more histologic characterization of the tumors would be helpful. Are they invasive, do larger tumors show necrosis and microvascular proliferation? This would help with understanding the full potential of the new model.”

      We have updated our manuscript to include more histolopathological characterization and images (Supplementary Fig 2).

      “Current thinking in established glioblastoma is that the M1/M2 designations for macrophages are not relevant, with microglia macrophage populations showing a mixture of pre- and anti-inflammatory features. Ideally, there would be a much more detailed characterization of the intratumoral microglia/macrophage population here, as single markers can’t be relied upon.”

      We performed additional gene set enrichment analyses (GSEA) using our sequencing datasets and compared p53EPS gene expression to M1/M2 macrophage expression signatures and expression signatures from MCSF-stimulated macrophages at early and late (M2 polarized) time-points. From this analysis, we detected enrichment for markers of both pro- and antiinflammatory features, however, with stronger and significant enrichment for gene expression signatures associated with classical pro-inflammatory M1 macrophages. We have included these GSEA plots and gene set enrichment lists as supplementary materials (Supplementary Fig 6, Supplementary Table 6). We also performed GSEA against a broad curated set of immunologic gene sets (C7: immunologic signature gene sets, Molecular Signatures Database, (Liberzon et al., 2011)) and have included the list of signatures and enrichment scores as a supplementary table (Supplementary Table 6). 

      “Phagocytosis could have anti-tumor effects through removal of live cancer cells or could be cancer-promoting if apoptotic cells are being rapidly cleared with concomitant activation of an immunosuppressive phenotype in the phagocytes (ie. efferocytosis).” 

      We looked at efferocytosis-associated gene expression in our sequencing dataset (124 “efferocytosis” genes, GeneCards), and while we detected upregulation of certain genes associated with efferocytosis in p53EPS brains, we did not detect significant enrichment for the entire gene set. Furthermore, we did not detect up-regulation of key efferocytosis receptors including Axl and Tyro3 (Supplementary Table 1, 2), compared to normal brains. While efferocytosis may contribute to tumor growth and evolution, this GSEA combined with our functional data supporting an inhibitory role for phagocytes in p53EPS tumor initiation and engraftment following transplantation (Fig 4, Fig 5, Supplementary Fig 7), suggests that efferocytosis is not a major driver of tumor formation in our model. However, how efferocytosis affects tumor progression in our model and/or relapse following therapy will be an interesting feature to explore in the future using temporal manipulations of phagocytes and/or treatments with chemical inhibitors.

      Author response image 1.

      Gene Set Enrichment Analysis (GSEA) for efferocytosis-associated gene expression (124 “efferocytosis” genes in GeneCards) in tp53EPS tumor brains, compared to normal zebrafish brains.

      Normalized enrichment score (NES) and p-value are indicated. 

      “Do the irf7/8 and chlodronate experiments distinguish between effects on microglia/macrophages and dendritic cells?”

      In addition to microglia/macrophages, the IRF8 transcription factor has been shown to control survival and function of dendritic cells (Sichien et al., 2016). Chlodronate treatments are also used to deplete both macrophages and dendritic cells in vivo. Therefore, we cannot distinguish the effects of these manipulations in our experiments and have updated our manuscript throughout to reflect this.     

      Reviewer #2:

      “The authors state that oncogenic MAPK/AKT pathway activation drives glial-derived tumor formation. It would be important to include a wild-type or uninjected control for the pERK and pAKT staining shown in Fig1 I-K to aid in the interpretation of these results. Likewise, quantification of the pERK and pAKT staining would be useful to demonstrate the increase over WT, and would also serve to facilitate comparison with the similar staining in the KPG model (Supp Fig 2D).”

      We have updated Fig 1 and Supplementary Fig 3D (formerly Fig 2D), to include histology from tumor-free uninjected control animals, as well as quantifications of p-ERK and p-AKT staining to highlight increased MAPK/AKT signaling pathway activation in our tumor model.  

      “The authors use a transplantation assay to further test the tumorigenic potential of dissociated cells from glial-derived tumors. Listing the percentage of transplants that generate fluorescent tumor would be helpful to fully interpret these data. Additionally, it was not clear based on the description in the results section that the transplantation assay was an “experimental surrogate” to model the relapse potential of the tumor cell. This is first mentioned in the discussion. The authors may consider adding a sentence for clarity earlier in the manuscript as it helps the reader better understand the logic of the assay.” 

      We have clarified in the text the percentage of transplants that generated fluorescent tumor (1625%, n=3 independent screens). This is also represented in Fig 5C,D. We also added text when introducing the transplantation assay, explaining that transplantation is frequently used as an experimental surrogate to assess relapse potential, and that our objective was to assess tumor cell propagation in the context of specific manipulations within the TME.  

      “The authors nicely show high levels of immune cell infiltration and associations between microglia/macrophages and tumor cells. However, a quantification of the emergence of macrophages over time in relation to tumor initiation and growth would provide significant support to the observations of tumor suppressive activity of the phagocytes. Along these lines, the inclusion of a statement about when leukocytes emerge during normal development would be informative for those not familiar with the zebrafish model.”

      In zebrafish, microglia colonize the neural retina by 48 hpf, and the optic tectum by 84 hpf (Herbomel et al., 2001), prior to when we typically observe lesions in our p53EPS brains. To validate the emergence of microglia prior to tumor formation in p53EPS, we have now used live confocal imaging through the brains of uninjected control and p53EPS injected zebrafish at 5, 7 and 9 dpf. As expected, microglia were present throughout the cephalic region and in the brain at 5 dpf (120 hpf). At this stage, p53EPS injected zebrafish brains displayed mosaic cellular expression of her4.1:mScarlet; however, cells were sparse and diffuse, and no large intensely fluorescent tumor-like clusters were detected at this stage (n=12/12 tumor negative). At 7 dpf, microglia were observed in the brains of control and p53EPS zebrafish; however, at this stage we detected clusters of her4.1:mScarlet+ cells (n=5/9), indicative of tumor formation. Lesions were found to be surrounded and/or infiltrated by mpeg:_EGFP+ microglia. Finally, at 9 dpf _her4.1:mScarlet+ expression became highly specific to tumor lesions, and these lesions were associated with _mpeg:_EGFP+ microglia/macrophages (n=8/8 of tumor-positive zebrafish). These descriptions along with representative images has been added to Figure 3.

      “From the data provided in Figure 4G and Supp Fig 7b, the authors suggest that “increased p53EPS tumor initiation following Irf gene knock-down is a consequence of irf7 and irf8 loss-of-function in the TME.” Given the importance of the local microenvironment highlighted in this study, spatial information on the form of in situ hybridization to identify the relevant location of the expression change would be important to support this conclusion.”

      We performed fluorescent in situ hybridization (using HCR RNA-FISH, Molecular Instruments) on whole mount control and irf7 CRISPR-injected p53EPG animals (her4.1:EGFRvIII +her4.1:PI3KCAH1047R + her4.1:GFP, GFP was used in this case because of probe availability).

      Representative confocal projections through tumors, as well as single optical sections are presented and discussed in Figure 4, highlighting the location of irf7 expression change following gene knock-down. We found significant irf7 signal in and surrounding p53EPS tumors at early stages of tumor formation_. This expression was reduced and/or lost following _irf7 CRISPR gene targeting, consistent with RT-PCR data (Supplementary Fig 7).          

      “The authors used neutral red staining that labels lysosomal-rich phagocytes to assess enrichment at the early stages of tumor initiation. The images in Figure 3 panel A should be labeled to denote the uninjected controls to aid in the interpretation of the data. In Supplemental Figure 6, the neutral red staining in the irf8 CRISPR-injected larvae looks to be increased, counter to the quantification. Can the authors comment if the image is perhaps not representative?”

      We have updated Figure 3 and Supplementary Figure 6 to aid in the interpretation of our results. In Fig 3A, we used tumor-negative controls from our injected cohorts. This was done to control for exogenous transgene presence and/or over-expression prior to (or in the absence of) malignant transformation. In Supplementary Fig 6, our images are representative, but we have now used unprocessed images with arrowheads to highlight neutral-red positive foci for clarity. In our original manuscript the images contained software generated markers, which could have obscured and/or confused the neutral red staining we were trying the highlight.    

      Recommendations For the Authors:

      Reviewer #1: 

      “The PI 3-kinase does a lot more than just activating mTOR and Akt – I would suggest modifying that sentence in the introduction.”

      We have adjusted text in the introduction to reflect the broad role for PI3K signaling.

      Reviewer #2:

      “In Supplemental Fig 1, it would be helpful for the authors to provide a co-stain, such as DAPI to label all nuclei, which would allow the reader to assess the morphology of the cells in the context of the surrounding tissue.”

      We have included brightfield images in Supplementary Fig 1, that together with her4.1:mScarlet fluorescence, should help readers assess tumor location and morphology in the context of surrounding tissue. Tumor cell morphology at high-resolution can be visualized in Fig 3, Movie 1 and Movie 2.

      “The authors state that oncogenic MAPK/AKT pathway activation drives glial-derived tumor formation. The authors may consider testing if the addition of an inhibitor of MAPK signaling may prevent or decrease the formation of glial-derived tumors in this context to further support their results.” 

      To further assess the role for MAPK activation, we decided to test the effect of 50uM AZD6244 MAPK inhibitor following transplantation of dissociated primary p53EPS cells into syngeneic CG1 strain zebrafish embryos, similar to as previously described (Modzelewska et al., 2016). Following 5 days of drug treatments, we did not detect significant differences in tumor engraftment or in tumor size between DMSO control and AZD6244-treated cohorts, suggesting that MAPK inhibition is not sufficient to prevent p53EPS engraftment and growth in our model. In the future, assessments of on-target drug effects, possible resistance mechanisms, and/or testing MAPK inhibitors in combination with other targeted agents including Akt and/or mTOR inhibitors (Edwards et al., 2006; McNeill et al., 2017; Schreck et al., 2020) will enhance our understanding of potential therapeutic strategies.

      Author response image 2.

      Dorsal views of 8 dpf zebrafish larvae engrafted with her4.1:mScarlet+ p53EPS tumor cells following treatment from 3-8dpf with 0.1% DMSO (control) or 50uM AZD6244. Tumor cell injections were performed at 2 dpf into syngeneic CG1 strain embryos. The percentage of total animals with persisting engraftment following drug treatments, as well as tumor size (microns squared, quantified using Carl Zeiss ZEN software) are shown for control and AZD6244 treated larvae. 

      “Have the authors tested if EGFR and PI3KCA driven by other neural promoters produce similar results, or not? This would help support the specificity of her4.1 neural progenitors and glia as the cell of origin in this model.”

      At this time, we have not tested other neural promoters. However, previous reports describe a zebrafish zic4-driven glioblastoma model with mesenchymal-like gene expression (Mayrhofer et al., 2017), supporting neural progenitors as a cell of origin. In the future it will be interesting to test sox2, nestin, and gfap promoters to further define and support her4.1-expressing neural progenitors and glia as the cell of origin in our model.

      “Other leukocyte populations, such as neutrophils, can also respond to inflammatory cues. Can the authors comment if neutrophils are also observed in the TME?”

      We performed initial assessments of neutrophils in the TME using our expression datasets as well as her4.1:EGFRvIII + her4.1:PI3KCAH1047R co-injection into Tg(mpx:EGFP) strain zebrafish. We observed tumor formation without significant infiltration of mpx:EGFP+ neutrophils. Future investigations will be important to assess differences in the contributions of different myeloidderived lineages in the TME of p53EPS, as well as how heterogeneity may be altered depending on different oncogenic drivers and/or stage of tumor progression, as seen in human glioblastoma (Friedmann-Morvinski and Hambardzumyan, 2023). We have added text in the disscussion section of our manuscript to indicate the possibility of neutrophils and/or other immune cell types contributing to p53EPS tumor biology. 

      Author response image 3.

      Control-injected tumornegative and tumor-positive Tg(mpx:EGFP) zebrafish at 10 dpf. Tg(mpx:EGFP) strain embryos were injected at the one-cell stage with her4.1:EGFRvIII + her4.1:PI3KCAH1047R + her4.1:mScarlet.

      “It is not clear if the transcriptomics data has been deposited in a publicly available database, such as the Gene Expression Omnibus (GEO). Sharing of these data would be a benefit to the field and facilitate use in other studies.”

      We have uploaded all transcriptomic data to GEO under accession GSE246295.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Original blots in Figures 2E and 2H should be shown as well as the quantification of miR-182-5p overexpression in HepG2 cells. miR-182-5p expression in T2D patients was 2.3-fold higher than ND patients. The lack of insights into the degree of miR-182-5p overexpression precluded proper interpretation of the data presented.

      Thank you very much for these comments. We now include the original uncut blots and relevant bands (new supplementary figure 3A) as well as the quantification of miR-182-5p expression in mimic-treated HepG2 cells in the supplement (new supplementary figure 2).

      (2) What are the upstream transcriptional regulators of miR-182-5p?

      To the best of our knowledge the upstream transcriptional regulators of miR-182-5p are currently unknown.

      (3) What's the purpose of the weight cycling cohort? Figure 3A only showed that miR-182-5p expression was highly correlated to body weight, but the cohort can not explain why the human cohort has different miR-182-5p expression. GTT and ITT data are lacking for this cohort and thus cannot demonstrate a causal link between insulin sensitivity and miR-182-5p. The lack of histological evidence cannot show the relationship between NAFLD and miR-182-5p.

      The purpose of the weight cycling cohort was to demonstrate that miR-182-5p is dynamically altered and that it can be reversed to almost control levels by weight loss. Thereby we validate in mice that obesity is associated with miR-182-5p upregulation (HFD group without intervention) and we propose that the adverse effects of increased miR-182-5p in obesity might be reversible by weight loss.  We did not perform ITTs and GTTs in this weigh cycling cohort because the HFD-model in C57BL/6 mice is well established and it can be assumed that glucose- and insulin-tolerance deteriorated during HFD feeding (doi.org/10.1038/oby.2007.608; doi:10.1007/978-1-61779-430-8_27 and improved after weight loss (doi:10.1038/s41598-023-40514-w). To corroborate this assumption, we provide plasma insulin along with as other important metabolic marker of the weight cycling model in supplemental figure 5A.

      (4) Loss-of-function of miR-182-5p and/or gain-of-function of Lrp6 in vivo or in vitro would clarify the importance of the miR-182-5p-Lrp6 axis and provide more direct evidence for its potential as a therapeutic target.

      We absolutely agree with the reviewer that loss of miR-182 and gain of LRP6 function experiments are missing. However, we provide miR-182 gain of function experiments that impressively show increased liver triglycerides after only seven days of miR-182 overexpression. Because these in vivo data are only short-term, we stated our conclusions carefully and point out that we do not have evidence for a direct involvement of miR-182-5p in insulin signaling. We are now planning follow-up studies in which miR-182-5p will be overexpressed and also antagonized for a longer time. However, for the timeframe of this revision process these extensive studies are not feasible and we ask the reviewer for his/her understanding.

      (5) The schematic summary is too complex and includes too many assumptions to faithfully represent the data shown in this study.

      We agree, the schematic summary is very complex. Therefore we simplified the upper part (new figure 5) and only focused on the clearly regulated genes and main pathways.

      Reviewer #2 (Recommendations For The Authors):

      (1) Although lots of microarray analyses were performed in this study, the authors didn't systemically investigate the function of miR-182 in T2DM or NAFLD. The current data provided in this manuscript may only support that miR-182 is involved in the homeostasis of glucose or insulin.

      We thank the reviewer for this comment and agree that the nature of or data is mostly correlative. We tried to overcome this by performing mechanistic in vitro data. Because overexpression of miR-182-5p decreases inulin signaling in vitro and induces hyperinsulinemia in vivo we still strongly believe that miR-182-5p is highly relevant for the homeostasis of glucose and insulin.

      (2) The authors used miRNA mimics to overexpress miR-182 in mice. How to emphasize the target specificity in the liver? Normally, adeno-associated virus 8 (AAV8) is used to specifically target the liver.

      Tail vein injections as used in our experimental set-up are known to deliver compounds directly to the liver via the portal vein. For modulation of microRNAs in the liver it is an established technique to deliver mimics (or inhibitors) via the tail vein (doi:10.1007/978-1-62703-435-7_18; doi: 10.1089/10430349950017734). To account for off-target effects we quantified miR-182-5p and target gene expression in spleen and heart. Although miR-182-5p concentrations in mimic treated mice were strongly increased in these tissues, expression in the liver was still highest (new supplementary figure 6A).

      (3) The HE and Oil red staining of the mouse liver should be shown in miR-182-5p overexpressing mice compared with the control mice, which could provide a more intuitive view of the fat content in the mouse liver.

      Unfortunately the livers were flash frozen and not optimally prepared for later histological analyses. Nevertheless, we performed H&E stainings in all livers and provide representative HE stainings of two control and two miR-182-mimic treated mice (new supplementary figure 5D). The increase hepatic lipid content is clearly visible in the H&E staining of miR-182-mimic treated mice and supports our previous findings of increased hepatic triglycerides (Figure 4H). Due to the freezing process, livers were damaged and Oil red staining was impossible.

      (4) After overexpression of miR-182-5p in mice, the serum insulin levels were increased. Does miR-182-5p affect insulin resistance in mice? The insulin tolerance test (ITT) experiment needs to be performed.

      We thank the reviewer for this comment. Indeed, the performance of an ITT would have clarified the effects of miR-182 on insulin tolerance best. Because we did not see differences in the GTT after treating mice acutely with the miR-182 mimic we decided to not perform the ITT in this short-term. The increased fasting serum levels after miR-182-5p mimic treatment (Fig. 4G) suggest that rather insulin sensitivity than insulin secretion is disturbed by miR-182-5p. We are aware, that in future experiments mice should be treated for a longer period with miR-182-5p mimics and that an ITT should be performed in these more chronic studies.

      (5) In Figure 2H, the author measured the level of p-Akt/Akt to indicate the effect of miR-182-5p on insulin resistance in HepG2 cells. It is best to provide the western blotting results of p-AKT and t-AKT after HepG2 cells are treated with or without insulin.

      We now provide the full blots for all western blotting experiments as new supplemental figure 3B. The HepG2 cells were stimulated with 20 nM insulin 10 min before harvest as described in 2.11 and consequently Akt and p-Akt were quantified. We did not analyze Akt and p-Akt without stimulation because Akt is rarely phosphorylated in the basal non-insulin stimulated state.

      (6) This study suggests that miR-182-5p may promote insulin resistance and hyperinsulinemia by downregulating LRP6. Nevertheless, to confirm this conclusion, we suggest you transfect miR-182-5p after downregulating the level of LRP6 with its siRNA for further validation.

      Because miR-182-5p targets LRP6 as we have validated by luciferase-assays, LRP6 levels are already low after miR-182-5p overexpression. Thus, the additional downregulation of LRP6 by other means (such as siRNAs) does not make sense in our opinion.

      (7) The author described that serum miR-182-5p was neither altered in T2D nor correlated with hepatic miR-182-5p expression, so is it suitable as the biomarker of T2D?

      Yes, as the reviewer stated correctly, serum concentrations of miR-182-5p were not related to its liver concentrations or the type 2 diabetic state. We therefore suggest that circulating miR-182-5p levels are not a suitable biomarker for T2D. We clarified this in the discussion.

      (8) What are the changes in fasting blood glucose levels in HFD, HC, and YoYo mouse models? Is there a correlation between miR-182-5p level and fasting blood glucose level in T2D patients and mouse models?

      Unfortunately, we did not measure the fasting blood glucose levels in this mouse model and therefore cannot answer this question. However, we provide the fasting insulin levels of our mouse models and their positive correlations with miR-182-5p (Fig. 3D and Suppl.Fig. 5D). In T2D humans, hepatic miR-182-5p correlates positively with fasting glucose (Fig. 2B).

      (9) The capitalization of the letters in "STrengthening the Reporting of OBservational studies in Epidemiology" should be checked. What does the "Among these is miRNAs miR-182-5p" mean? Please clarify it.

      The “STrengthening the Reporting of OBservational studies in Epidemiology “ report form is abbreviated as “STROBE” list. We this capitalized the letters that are used to build the abbreviation.

      “Among these is miRNAs miR-182-5p” is a typo for which we apologize. It should mean “Among these conserved miRNAs is miR-182-5p.” We corrected this error.

      Reviewer #3 (Recommendations For The Authors):

      (1) The functional importance of miR-182 on gene expression is not rigorously tested.

      (A) Many of the target genes in Fig. 1C and Fig. 3 are controlled by multiple factors that are known to be increased with obesity (e.g., lipogenic genes are increased by hyperinsulinemia), making it likely that their association with miR-182 is correlative rather than a consequence of miR-182 increases.

      We thank the reviewer for this comment and agree that miR-182 is not the only factor regulating the here investigated genes. We rather propose, that miR-182 could be an additional upstream regulator that holds the potential to modify entire pathways of insulin signaling and lipogenesis. However, miR-182 should be not viewed as an on/off-switch as it likely plays a modulating role. Although, our in vivo data stemming from humans and mice are correlative we believe that the in vitro data derived in HepG2 cells clearly show a causal role for miR-182-5ß in decreasing LRP6 and insulin signaling, indicated by lower AKT phosphorylation after miR-182-5p overexpression.

      (B) 500-fold overexpression of miR-182 does not significantly change gene expression. The authors need to knockdown miR-182 in mice and then feed them a chow versus high-fat diet. If miR-182 is a significant regulator of these genes, the effects of the diet will be blunted.

      We thank the reviewer for the constructive criticism and agree that an optimal experiment would be to antagonize miR-182-5p in mice to rescue glucose and lipid metabolism. There here presented in vivo upregulation of miR-182-5p was a proof-of-concept study to confirm our hypothesis in a reasonable timeframe. We are aware, that follow-up studies are needed, and we are now planning studies in which miR-182-5p will be overexpressed and also antagonized for a longer time. However, for the timeframe of this revision process these extensive studies are not feasible and we ask the reviewer for his/her understanding. 

      (2) It has previously been shown that miR-182 is in a polycistrionic microRNA locus that is activated directly by SREBP-2. Is this also true in humans? If so, this would indicate that miR-182 is a marker of SREBP activity. How does the nuclear active form of SREBP1 and SREBP2 change in the human livers and HFD-fed mice?

      We thank the reviewer for this very interesting question. Suitable experiments to investigate if miR-182-5p is activated by SREBF would be EMSAs or ChIPs. Unfortunately we have only frozen protein lysate of the human livers left in which such experiments cannot be performed. We agree that this should be prioritizes in the future.

      (3) Similarly, to test the role of LRP6 in mediating the effects of miR-182, the authors should compare the effects of miR-182 overexpression in the presence and absence of LRP6.

      Because miR-182-5p targets LRP6 as we have validated by luciferase-assays, LRP6 levels are already low after miR-182-5p overexpression. Thus, the additional downregulation of LRP6 by other means (such as siRNAs) does not make sense in our opinion.

      (4) The methods are a bit confusing. The authors state that "we applied a logistic regression analysis for the 594 mature miRNAs using the NAFLD activity score (NAS) as a cofactor to exclude any bias by hepatic fat content, lobular inflammation, and fibrosis." However, they later showed that miR-182 levels are correlated with NAS. Please clarify.

      We excluded NAFLD explicitly as driving factor for the association to T2D by including a surrogate (the NAFLD activity score) as cofactor. It is well known that NAFLD and T2D are indeed likely associated to each other. Since not all our included individuals with T2D have NAFLD and vice versa, a second correlation with NAS revealed also that a high NAS is associated with higher expression of miR-182.

      (5) Does two-fold overexpression of miR-182 (which mimics the effects of HFD) have any effect on chow-fed mice?

      This is a very interesting question that we unfortunately cannot answer right now. We are planning further mouse studies in which we will include a chow-fed mice as controls.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer 1:

      Weaknesses:

      While I generally agree with the author's interpretations, the idea of Saccorhytida as a divergent, simplified off-shot is slightly contradictory with a probably non-vermiform ecdysozoan ancestor. The author's analyses do not discard the possibility of a vermiform ecdysozoan ancestor (importantly, Supplementary Table 4 does not reconstruct that character),

      Saccorhytids are only known from the early Cambrian and their unique morphology has no equivalent among any extinct or extant ecdysozoan groups. This prompted us to consider them as a possible dead-end evolutionary off-shot. The nature of the last common ancestor of ecdysozoan (i.e. an elongated worm-like or non-vermiform animal with capacities to renew its cuticle by molting) remains hypothetical. At present, palaeontological data do not allow us to resolve this question. The animal in Fig. 4b at the base of the tree is supposed to represent an ancestral soft-bodied form with no cuticle from which ecdysozoan evolved via major innovations (cuticular secretion and ecdysis). Its shape is hypothetical as indicated by a question mark. Our evolutionary model is clearly intended to be tested by further studies and hopefully new fossil discoveries.

      …and outgroup comparison with Spiralia (and even Deuterostomia for Protostomia as a whole) indicates that a more or less anteroposteriorly elongated (i.e., vermiform) body is likely common and ancestral to all major bilaterian groups, including Ecdysozoa. Indeed, Figure 4b depicts the potential ancestor as a "worm". The authors argue that the simplification of Saccorhytida from a vermiform ancestor is unlikely "because it would involve considerable anatomical transformations such as the loss of vermiform organization, introvert, and pharynx in addition to that of the digestive system". However, their data support the introvert as a specialisation of Scalidophora (Figure 4a and Supplementary Table 4), and a pharyngeal structure cannot be ruled out in Saccorhytida. Likewise, loss of an anus is not uncommon in Bilateria. Moreover, this can easily become a semantics discussion (to what extent can an animal be defined as "vermiform"? Where is the limit?).

      We agree that “worm” and “vermiform” are ill-defined terms. They are widely used in various palaeontological and biological papers to describe elongated tubular animals such as edydsozoans and annelids (see Giribet and Edgecombe 2017; popular textbook written by Nielsen 2012; Schmit-Rhaesa 2013; Brusca et al. 2023; Giribet and Edgecombe 2020). Very few other animals are termed “worms”. Changes have been made in the text to solve this semantic problem, for example in the abstract where we added (i.e elongated and tubular) to better define what we mean by “vermiform”.

      Priapulid worms or annelids are examples of extremely elongated, tubular animals. In saccorhytids, the antero-posterior elongation is present (as it is in the vast majority of bilaterians) but extremely reduced, Saccorhytus and Beretella having a sac-like or beret-shape, respectively. That such forms may have derived from elongated, tubular ancestors (e.g. comparable with present-day priapulid worms) would require major anatomical transformations that have no equivalent among modern animals. We agree that further speculation about the nature of these transformations is unnecessary and should be deleted simply because the nature of these ancestors is purely hypothetical. We also agree that the loss of anus and the extreme simplification of the digestive system is common among extant bilaterians. In Figure 4b, the hypothetical pre-ecdysozoan animal is slightly elongated (along its antero-posterior axis) but in no way comparable with a very elongated and cylindrical ecdysozoan worm (e.g. extant or extinct priapulid).

      Therefore, I suggest to leave the evolutionary scenario more open. Supporting Saccorhytida as a true group at the early steps of Ecdysozoa evolution is important and demonstrates that animal body plans are more plastic than previously appreciated. However, with the current data, it is unlikely that Saccorhytida represents the ancestral state for Ecdysozoa (as the authors admit), and a vermiform nature is not ruled out (and even likely) in this animal group. Suggesting that the ancestral Ecdysozoan might have been small and meiobenthic is perhaps more interesting and supported by the current data (phylogeny and outgroup comparison with Spiralia).

      We agree to leave the evolutionary scenario more open, especially the evolutionary process that gave rise to Saccorhytida. Again, we know nothing about the morphology of the ancestral ecdysozoan (typically the degree of body elongation, whether it had a differentiated introvert or not, whether it had a through gut or not). In Fig.4, the ancestral ecdysozoan is supposed to have evolved from a soft-bodied epibenthic animal through key innovations such as the secretion of a cuticle and ecdysis. It is a hypothesis that needs to be tested by further studies and fossil discoveries. Speculations concerning the process through which saccorhytids may have arisen have been deleted.

      Reviewer 2:

      Weaknesses:

      The preservations of the specimens, in particular on the putative ventral side, are not good, and the interpretation of the anatomical features needs to be tested with additional specimens in the future. The monophyly of Cycloneuralia (Nematoida + Scalidophora) was not necessarily well-supported by cladistic analyses, and the evolutionary scenario (Figure 4) also needs to be tested in future works.

      Yes, we agree that the animal described in our manuscrip remains enigmatic (e.g. the natures of its internal organs, its lifestyle, etc..). Whereas the dorsal side of the animal is well documented (consistent pattern of pointed sclerites), uncertainties remain concerning its ventral anatomy (typically the mouth location and shape). Additional better-preserved specimens will hopefully provide the missing information. Concerning Cycloneuralia, their monophyly is generally better supported by analyses based on morphological characters than in molecular phylogenies.

      Reviewer 3:

      Weaknesses:

      I, as a paleontology non-expert, experienced several difficulties in reading the manuscript. This should be taken into consideration when assuming a wide range of readers including non-experts.

      We have ensured that the text is comprehensible to biologists. The main results are summarized in relatively simple diagrams (e.g. Fig. 4) that can be understood by non-specialized readers. We are aware that technical descriptive terms may appear obscure to non-specialists. We can hardly avoid them in the descriptive parts. However, our figures (e.g. SEM images and 3D-reconstruction) are clear enough to give the reader a clear idea of the morphology of Beretella.

      Recommendations for the authors:

      All three reviewers appreciate the discovery and found the merit of publishing this manuscript. They also raised some concerns about the data presentation. The authors are requested to perform no additional analysis but to go through all the reviewer comments and rebut or intake them in revising the manuscript.

      Reviewer 1:

      - Line 41: comma after "ecdysozans".

      OK, done.

      - Formatting style: add a space before references.

      OK, done.

      - Line 169: B. spinosa in italics

      OK, done.

      - Line 157: could the "relatively large opening" in the flattened ventral side of a mouth (even when altered by the fossilisation process)?

      Most bilaterians have a mouth. There is no opening on the relatively well-preserved dorsal side of Beretella, that could be interpreted as a mouth. In contrast the flattened ventral side often show a depressed area that could potentially bear a mouth. This ventral area is often pushed in and poorly preserved. The cuticle of this ventral side might have been relatively thinner, perhaps more flexible than that of the dorsal one (with strong sclerites). These differences might explain why the possible oral area is poorly preserved.

      - Line 178: "position of the mouth"

      OK, done.

      - Line 219: "These sclerites, unknown..."

      OK, done.

      - Line 282: update reference formatting

      OK, done.

      - Line 298: remove reference to Supplementary Table 4, as it does not refer to the possible vermiform nature of the last common ecdysozoan ancestor?

      OK, done.

      - Figure 4a: change "paired legs" for "paired appendages"?

      OK, done.

      - Supplementary Table 4: For TGE and Introvert, the state 0 (absent) should be in bold and underlined (as it is the most likely state).

      OK, done.

      Reviewer 2:

      Line 25: "from the early Cambrian" should be changed into "from the lower Cambrian"

      OK, done.

      Line 126: The range of maximum length should be reported in µm (rather than mm) just like those of maximum width and height.

      OK, done.

      Lines 191-192: Please recheck the figure panels of Saccorhytus (Supplementary Figure 4c) and scalidophoran worm (Supplementary Figure 4d). Perhaps, the former should refer to Figure 4d, and the latter to Figure 4c?

      OK, done.

      Lines 239 and 241: "1" and "2" appear to stand for citations (the other journal style), but I am not certain what they are.

      To avoid confusing, we replace ‘1’ and ‘2’ by ‘i’ and ‘ii’.

      Figures 3d and 4a: "Cycloneuralia" should be included in the phylogenetic trees.

      OK, done.

      Figure 3: The caption for the panel d is redundant. It should be changed into, for example, "Phylogenetic tree obtained from cladistic analyses using maximum likelihood (IQTREE)."

      OK, done.

      Supplementary Figures 6-9: In the captions, more detailed explanations of the results (for example, "50% majority rule consensus of XXX trees" and "strict consensus of all 4 most-parsimonious trees") should be provided.

      OK, done.

      Supplementary Figures 8 and 9: The caption explains that Cycloneuralia is resolved as a paraphyletic group, but it is not certain because Nematoida, Scalidophora, and Panarthropoda are resolved in a polytomy.

      We changed the sentence into:

      “Note that Cycloneuralia does not appear as a monophyletic clade”

      Reviewer 3:

      Line 25 'tiny' - I suggest giving an absolute measure of the size.

      We add ‘maximal length 3 mm’.

      Line 29 'both forms' - This is hard to follow by a non-expert. Can this be replaced with 'fossil species'?

      OK, done.

      Line 32 'dead-end' - Is this word necessary? I suggest to skip this word, as it is obvious that this lineage is extinct.

      OK, done.

      Lines 80, 94, and 172 'Remarks' - I, as a palaeontology non-expert, cannot get this manuscript structure with a repetition of this same section title.

      Our systematic descriptions follow the standard rules in palaeontology.

      Line 119 - I could not get what this 'Member 5' that was not introduced earlier means.

      In Stratigraphy, ‘member’ is a lithostratigraphic subdivision (a Formation is usually subdivided into several Members).

      Lines 104, 105, 417, ... - The name of the organization or database hosting these IDs (CUB.... and ELIXX....) should also be supplied.

      OK, done.

      Lines 341 and 361 - These two Figures (Figures 1 and 2) have the same caption (with an addition to the one for Figure 1). There should be a distinction based on what is presented in each figure.

      We corrected the caption of Figure 2 and wrote the following: ‘Beretella spinosa gen. et sp. nov.’.

      Line 362-367 - There is no guide about what the individual figure panels (e.g., Figure 2g, 2h, and 2i) show in detail. This guide should be supplied. This also applies to Figure 3a-c - are they anterolateral (a), dorsal (b), and posterolateral (c) views? It is better to write clearly in this way.

      OK, done.

      Figure 3d - The color contrast is not sufficient, and this figure does not look reader-friendly. Plus, the division into Cycloneuralia and Panarthropoda is indicated above the tree, but it is not clear what range of lineages these clades include. For example, is Pliciloricidae included in Cycloneuralia? Also, is Collinsium included in Panarthropoda? This figure looks quite unreliable, and it should be easy to fix.

      OK, done.

      Line 277 legend of Figure 3 - Including the parenthesis only with the program name (IQTREE) is not useful at all. Isn't it enough to describe it in Methods?

      OK, done. We remove (IQTREE).

      Line 380 legend of Figure 3 - I could not get where 'thicker bars' are.

      Known fossil record indicated by thicker vertical bars. We added “vertical”.

      Line 453 - Give full names of the methods, maximum parsimony, and maximum-likelihood.

      OK, done.

      Line 489 - State clearly what 'the recent paper' means.

      Replace ‘recent’ by ‘present’.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors addressed how long-range interactions between boundary elements are established and influence their function in enhancer specificity. Briefly, the authors placed two different reporters separated by a boundary element. They inserted this construct ectopically ~140 kb away from an endogenous locus that contains the same boundary element. The authors used expression patterns driven by nearby enhancers as an output to determine which enhancers the reporters interact with. They complemented this analysis with 3D DNA contact mapping. The authors found that the orientation of the boundary element determined which enhancers each reporter interacted with. They proposed that the 3D interaction topology, whether being circular or stem configuration, distinguished whether the interaction was cohesin mediated or through an independent mechanism termed pairing.

      Strengths:

      The transgene expression assays are built upon prior knowledge of the enhancer activities. The 3D DNA contacts confirm that transgene expression correlates with the contacts. Using 4 different orientations covers all combinations of the reporter genes and the boundary placement.

      Weaknesses:

      The interpretation of the data as a refusal of loop extrusion playing a role in TAD formation is not warranted, as the authors did not deplete the loop extruders to show that what they measure is independent.

      (1.1) To begin with, our findings do not exclude the possibility that cohesin loop extrusion has some sort of role in the formation or maintenance of TADs in flies or other aspects of chromosome structure.  On the other hand, it clearly is not determinative in defining the end-points of TADs or in generating the resulting topology (stem-loop or circle-loop).  Our main point, which we feel we have established unequivocally, is that it can’t explain many essential features of TADs or chromosome loops (see below) in Drosophila.  This reviewer agrees with this point in their next paragraph (below).  We also think that the loop extrusion model’s general acceptance as THE driving force behind TAD formation in mammals is unwarranted and not fully consistent with the available data, as explained below.

      As to the reviewer’s specific point regarding depletion of loop extruders, we first note that completely eliminating factors encoding cohesin subunits in fly embryos isn’t readily feasible.  As cohesin is essential starting at the beginning of embryonic development, and is maternally deposited, knockdowns/depletions would likely be incomplete and there would always be some remaining activity.  As long as there is some residual activity—and no disruption in TAD formation is observed—this experimental test would be a failure.  In addition, any defects that are observed might arise not from a failure in TAD formation via loop extrusion but rather because the rapid mitotic cycles would be disrupted.  A far better approach would be to deplete/knockdown cohesin subunits in tissue culture cells, as there is no requirement for the cells to undergo embryonic development.  Moreover, since cell division is relatively slow, the depletion would likely eliminate much if not all of the activity before a checkpoint is reached.

      While a drastic depletion of cohesin is not feasible in our model organism, we would draw the reviewer’s attention to an experiment of this type which has already been done in mammalian tissue culture cells by Goel et al. (Goel et al. 2023).  Unlike most Hi-C studies in mammals, the authors used region capture MicroC (RCMC).  In contrast to published genome-wide mammalian MicroC experiments (c.f., (Hsieh et al. 2020; Krietenstein et al. 2020)) which require large bin sizes to visualize mammalian “TADs,” the resolution of the experiments in Goel et al. (Goel et al. 2023) is similar to the resolution in our MicroC experiments (200-400 bp).  A MicroC contact map from Goel et al. shows the Pdm1g locus on chromosome 5 before and after Rad21 depletion.  The contact map visualizes a 250 kb DNA segment, which is only slightly larger than the ~230 kb DNA segment in Fig. 2C in our paper.

      In this experiment, there was a 97% reduction in the amount of Rad21.  However, as can be seen by comparing the contact profiles above and below the diagonal, there is little or no difference in TAD organization after cohesin depletion when individual TADs are visualized with a bin size of 250 bp.  These results would indicate that mammalian TADs do not require cohesin.

      Note also that the weak 45o stripes connecting different TADs (c.f. blue/green arrowheads) are still present after Rad21 depletion.  In the most popular version of the loop extrusion model, cohesin loads at a site(s) somewhere in the TAD-to-be, and then extrudes both strands until it bumps into CTCF roadblocks.  As illustrated in Figure Sup 2, this mechanism generates a vertical stripe originating at the cohesin loading site and extending until cohesin bumps into the left or right roadblock, at which point the stripe transitions into 45o stripe that ends when cohesin bumps into the other roadblock.  While 45o stripes are visible, there is no hint of a vertical stripe.  This suggests that the mechanism for generating stripes, if it is an active mechanism (rather than passive diffusion) may be quite different.  The 45o stripes must be generated by a factor(s) that is anchored to one (blue arrowhead) or both (green arrowhead) boundaries.  In addition, this factor, whatever it is, is not cohesin.  The reason for this is that the 45o stripes are present both before and after Rad21 depletion.  Moreover, if one were to imagine that the stripes represent a process involved in TAD formation, this process does not require cohesin (see Goel et al 2023).

      It is worth noting another observation that is inconsistent with the cohesin loop extrusion/CTCF roadblock model for TAD formation/maintenance.  CTCF is not found at all of the TAD boundaries in this 250 kb DNA region.  This would suggest that there are other DNA binding proteins that have chromosomal architectural functions besides CTCF.  In flies, many of the chromosomal architectural proteins are, like CTCF, polydactyl zinc finger (PZF) proteins (Bonchuk et al. 2021; Bonchuk et al. 2022; Fedotova et al. 2017).  These include Su(Hw), CTCF, Pita, Zipic and CLAMP.  The PZF family in flies is quite large.  There are ~250 different PZF genes, and since only a handful of these have been characterized, it seems likely that additional members of this family will have architectural functions.  Thus far, only one boundary protein, CTCF, has received attention in studies on mammalian chromosome architecture.  As the mammalian genome is much larger and more complicated than the fly genome, it is difficult to believe that CTCF is the sole chromosomal architectural protein in mammals.  In this respect, it is worth noting that there are ~800 members of the PZF family in mammalian genomes (Fedotova et al. 2017).

      Goel et al. (Goel et al. 2023) did observe alterations in the contact profiles after Rad21 depletion when they visualized the Ppm1g region at much lower resolution (bin sizes of 5 kb and 1 kb). The 5 kb bin size visualizes a region of ~1.2 Mb, while the 1 kb bin size visualizes a region that spans ~800 kb.  These large triangular units do not correspond to the individual TADs seen when Goel et al. visualized the Ppm1g locus at 250 bp resolution. 

      Nor do they correspond to TADs in Fig. 2 of our paper.  Instead they represent TAD neighborhoods which, likely consist of 20-30 or more individual TADs.  Consequently the alterations in contact patterns seen after Rad21 depletion are occurring at the level of TAD neighborhoods.  This can be seen by comparing pixel density inside the blue lines before (above the diagonal) and after Rad21 depletion (below the diagonal) (Goel et al 2023).  The more distant contacts between individual TADs within this neighborhood are preferentially reduced by Rad21 depletion (the region below and to the left of the double arrowhead).  By contrast, the TADs themselves are unaffected, as are contacts between individual TADs and their immediate neighbors (see purple and light green asterisk).  The other interesting feature is the loss of contacts between what appears to be partially overlapping neighborhoods.  This loss of neighborhood-toneighborhood contacts can be seen in the region located between the green and blue lines.  The neighborhood that appears to partially overlap the Ppm1g neighborhood is outlined in purple.

      It worth noting that, with the exception of the high resolution experiments in Goel et al., all of the other studies on cohesin (and CTCF) have examined the effects on contact maps within (and between) large neighborhoods (bin sizes >1 kb).  In most cases, these large neighborhoods are likely to be composed of many individual TADs like those seen in Goel et al. and in Fig. 2 of our paper.  We also observe larger neighborhoods in the fly genome, though they do not appear to be as large as those in mammals.  Our experiments do not address what role cohesin might have in facilitating contacts between more distant TADs located within the same neighborhoods, or between TADs in different neighborhoods, or whether loop extrusion is involved.

      We would also note that the Drosophila DNA segment in Fig. 2C contains 35 different genes, while the mammalian DNA segment shown in Fig. 1 has only 9.  Thus, in this part of the fly genome, Pol II genes are more densely packed than in the mammalian DNA segment.  Much of the fly genome is also densely packed, and the size of individual TADs will likely be smaller, on average, than in mammals.  Nevertheless, the MicroC profiles are not all that different.  As is also common in flies, each TAD in the Ppm1g region only encompasses one or two genes.  Note also that there are no volcano triangles with plumes as would be predicted for TADs that have a stem-loop topology.

      In fact, as shown in Author response image 1, the high-resolution contact profile for the Ppm1g region shows a strong resemblance to that observed for the fly Abd-B regulatory domains.  These regulatory domains are part of larger neighborhood that encompasses the abd-A and Abd-B genes and their regulatory domains.

      Author response image 1.

      Abd-B regulatory domains

      As the authors show, the single long DNA loop mediated by cohesin loop extrusion connecting the ectopic and endogenous boundary is clearly inconsistent with the results, therefore the main conclusion of the paper that the 3D topology of the boundary elements a consequence of pairing is strong. However, the loop extrusion and pairing are not mutually exclusive models for the formation of TADs. Loop-extruding cohesin complexes need not make a 140 kb loop, multiple smaller loops could bring together the two boundary elements, which are then held together by pairing proteins that can make circular topologies.

      (1.2) In the pairing model, distant boundaries bump into each other (by random walks or partially constrained walks), and if they are “compatible” they pair with each other, typically in an orientation-dependent manner.  As an alternative, the reviewer argues that cohesin need not make one large 140 kb loop.  Instead it could generate a series of smaller loops (presumably corresponding to the intervening TADs).  These smaller loops would bring homie in the transgene in close proximity to the eve locus so that it could interact with the endogenous homie and nhomie elements in the appropriate orientation, and in this way only one of the reporters would be ultimately activated.

      There are two problems with the idea that cohesin-dependent loop extrusion brings transgene homie into contact with homie/nhomie in the eve locus by generating a series of small loops (TADs).  The first is the very large distances over which specific boundary:boundary pairing interactions can occur.  The second is that boundary:boundary pairing interactions can take place not only in cis, but also in trans.

      We illustrate these points with several examples. 

      Fujioka et al. 2016, Fig 7 shows an experiment in which attP sites located ~2 Mb apart were used to insert two different transgenes, one containing a lacZ reporter and the other containing the eve anal plate enhancer (AP) (Fujioka et al. 2016).  If the lacZ reporter and the AP transgenes also contain homie, the AP enhancer can activate lacZ expression (panel A,).  On the other hand, if one of the transgenes has lambda DNA instead of homie, no regulatory interactions are observed (panel A,).  In addition, as is the case in our experiments using the -142 kb platform, orientation matters.  In the combination on the top left, the homie boundary is pointing away from both the lacZ reporter and the AP enhancer.  Since homie pairs with itself head-tohead, pairing brings the AP enhancer into contact with the lacZ reporter.  A different result is obtained for the transgene pair in panel A on the top right.  In this combination, homie is pointing away from the lacZ reporter, while it is pointing towards the AP enhancer.  As a consequence, the reporter and enhancer are located on opposite sides of the paired homie boundaries, and in this configuration they are unable to interact with each other.

      On the top left of panel B, the homie element in the AP enhancer transgene was replaced by a nhomie boundary oriented so that it is pointing towards the enhancer.  Pairing of homie and nhomie head-to-tail brings the AP enhancer in the nhomie transgene into contact with the lacZ reporter in the homie transgene, and it activates reporter expression.  Finally, like homie, nhomie pairs with itself head-to-head, and when the nhomie boundaries are pointing towards both the AP reporter and the lacZ reporter, reporter expression is turned on.

      Long distance boundary-dependent pairing interactions by the bithorax complex Mcp boundary have also been reported in several papers.  Fig. 6 from Muller et al. (Muller et al. 1999) shows the pattern of regulatory interactions (in this case PRE-dependent “pairing-sensitive silencing”) between transgenes that have a mini-white reporter, the Mcp and scs’ boundaries and a PRE that is located close to Mcp.  In this experiment flies carrying transgenes inserted at the indicated sites on the left and right arms of the 3rd chromosome were mated in pairwise combinations, and their trans-heterozygous progeny examined for pairing-sensitive silencing of the mini-white reporter.

      Two examples of long-distance pairing-sensitive silencing mediated by Mcp/scs’ are shown in Fig. 5b from Muller et al. 1999.  The transgene inserts in panel A are w#12.43 and ff#10.5w#12.43 is inserted close to the telomere of 3R at 99B.  ff10.5 is inserted closer to the middle of 3R at 91A.  The estimate distance between them is 11.3 Mb.  The transgene inserts in panel B are ff#10.5 and ff#11.102ff#11.102 is inserted at 84D, and the distance between them is 11 Mb.  Normally, the eye color phenotype of the mini-white reporter is additive: homozygyous inserts have twice as dark eye color as hemizygous inserts, while in trans-_heterozygous flies the eye color would be the sum of the two different transgenes.  However, when a PRE is present and the transgene can pair, silencing is observed.  In panel A, the t_rans-_heterozygous combination has a lighter eye color than either of the parents.  In panel B, the _trans-_heterozygous combination is darker than one of the parents (_ff#10.5) but much lighter than the other (ff#11.102).

      All ten of the transgenes tested were able to engage in long distance (>Mbs) trans_regulatory interactions; however, likely because of how the chromosome folds on the Mb scale (e.g., the location of meta-loops: see #2.1 and Author response image 3) not all of the possible pairwise silencing interactions are observed.  The silencing interactions shown in Muller et.al. are between transgenes inserted on different homologs.  _Mcp/scs'-dependent silencing interactions can also occur in cis. Moreover, just like the homie and nhomie experiments described above, Muller et.al. (Muller et al. 1999) found that Mcp could mediate long-distance activation of mini-white and yellow by their respective enhancers.

      The pairing-sensitive activity of the PRE associated with the Mcp boundary is further enhanced when the mini-white transgene has the scs boundary in addition to Mcp and scs’.  In the experiment shown in Fig. 8 from Muller et al. 1999, the pairing-sensitive silencing interactions of the Mcp/scs’/scs transgene are between transgenes inserted on different chromosomes.  Panel A shows pairing-sensitive silencing between w#15.60, which is on the X chromosome, and w#15.102, which is on the 2nd chromosome.  Panel B shows pairing-sensitive silencing between the 2nd chromosome insert w#15.60 and a transgene, w#15.48, which is inserted on the 3rd chromosome.

      The long-distance trans and cis interactions described here are not unique to homie, nhomie, Mcp, scs’, or scs.  Precisely analogous results have been reported by Sigrist and Pirrotta (Sigrist and Pirrotta 1997) for the gypsy boundary when the bxd PRE was included in the mini-white transgene.  Also like the Mcp-containing transgenes in Muller et al. (Muller et al. 1999), Sigrist and Pirrotta observed pairing-sensitive silencing between gypsy bxd_PRE _mini-white transgenes inserted on different chromosomes.  Similar long-distance (Mb) interactions have been reported for Fab-7 (Bantignies et al. 2003; Li et al. 2011).  In addition, there are examples of “naturally occurring” long-distance regulatory and/or physical interactions.  One would be the regulatory/physical interactions between the p53 enhancer upstream of reaper and Xrp1 which was described by Link et al. (Link et al. 2013).  Another would be the nearly 60 meta-loops identified by Mohana et al. (Mohana et al. 2023).

      Like homie at -142 kb, the regulatory interactions (pairing-sensitive silencing and enhancer activation of reporters) reported in Muller et al. (Muller et al. 1999) involve direct physical interactions between the transgenes.  Vazquez et al. (Vazquez et al. 2006) used the lacI/lacO system to visualize contacts between distant scs/Mcp/scs’-containing transgenes in imaginal discs.  As indicated in Vasquez et al. 2006, Table 3 lines #4-7,  when both transgenes have Mcp and were inserted on the same chromosome, they colocalized in trans-_heterozygotes (single dot) in 94% to 97% of the disc nuclei in the four pairwise combinations they tested.  When the transgenes both lacked _Mcp (Vasquez et al. 2006, Table 3 #1), co-localization was observed in 4% of the nuclei.  When scs/Mcp/scs’-containing transgenes on the 2nd and 3rd chromosome were combined (Vasquez et al. 2006, Table 3 #8), colocalization was observed in 96% of the nuclei.  They also showed that four different scs/Mcp/scs’ transgenes (two at the same insertion site but on different homologs, and two at different sites on different homologs) co-localized in 94% of the eye imaginal disc nuclei (Vasquez et al. 2006, Table 3 #9).  These pairing interactions were also found to be stable over several hours.  Similar co-localization experiments together with 3C were reported by Li et al. (Li et al. 2011).

      The de novo establishment of trans interactions between compatible boundary elements has been studied by Lim et al. (Lim et al. 2018).  These authors visualized transvection (enhancer activation of a MS2 loop reporter in trans) mediated by the gypsy insulator, homie and Fab-8  in NC14 embryos.  When both transgenes shared the same boundary element, transvection/physical pairing was observed in a small subset of embryos.  The interactions took place after a delay and increased in frequency as the embryo progressed into NC14.  As expected, transvection was specific: it was not observed when the transgenes had different boundaries.  For homie it was also orientation-dependent.  It was observed when homie was orientated in the same direction in both transgenes, but not when homie was orientated in opposite directions in the two transgenes.

      While one could imagine that loop extrusion-dependent compaction of the chromatin located between eve and the transgene at -142 kb into a series of small loops (the intervening TADs) might be able to bring homie in the transgene close to homie/nhomie in the eve locus, there is no cohesinbased loop extrusion scenario that would bring transgenes inserted at sites 6 Mb, 11 Mb, on different sides of the centromere, or at opposite ends of the 3rd chromosome together so that the distant boundaries recognize their partners and physically pair with each other.  Nor is there a plausible cohesin-based loop extrusion mechanism that could account for the fact that most of the documented long-distance interactions involve transgenes inserted on different homologs.  This is not to mention the fact that long-distance interactions are also observed between boundarycontaining transgenes inserted on different chromosomes.

      In fact, given these results, one would logically come to precisely the opposite conclusion.  If boundary elements inserted Mbs apart, on different homologs and on different chromosomes can find each other and physically pair, it would be reasonable to think that the same mechanism (likely random collisions) is entirely sufficient when they are only 142 kb apart.

      Yet another reason to doubt the involvement or need for cohesin-dependent loop extrusion in bringing the transgene homie in contact with the eve locus comes from the studies of Goel et al. (Goel et al. 2023).  They show that cohesin has no role in the formation of TADs in mammalian tissue culture cells.  So if TADs in mammals aren’t dependent on cohesin, there would not be a good reason to think at this point that the loops (TADs) that are located between eve and the transgene are generated by, or even strongly dependent on, cohesin-dependent loop extrusion.

      It is also important to note that even if loop-extrusion were to contribute to chromatin compaction in this context and make the looping interactions that lead to orientation-specific pairing more efficient, the role of loop extrusion in this model is not determinative of the outcome, it is merely a general compaction mechanism.  This is a far cry from the popular concept of loop extrusion as being THE driving force determining chromosome topology at the TAD level.

      Reviewer #2 (Public Review):

      In Bing et al, the authors analyze micro-C data from NC14 fly embryos, focusing on the eve locus, to assess different models of chromatin looping. They conclude that fly TADs are less consistent with conventional cohesin-based loop extrusion models and instead rely more heavily on boundaryboundary pairings in an orientation-dependent manner.

      Overall, I found the manuscript to be interesting and thought-provoking. However, this paper reads much more like a perspective than a research article. Considering eLIFE is aimed at the general audience, I strongly suggest the authors spend some time editing their introduction to the most salient points as well as organizing their results section in a more conventional way with conclusion-based titles. It was very difficult to follow the authors' logic throughout the manuscript as written. It was also not clear as written which experiments were performed as part of this study and which were reanalyzed but published elsewhere. This should be made clearer throughout.

      It has been shown several times that Drosophila Hi-C maps do not contain all of the features (frequent corner peaks, stripes, etc.) observed when compared to mammalian cells. Considering these features are thought to be products of extrusion events, it is not an entirely new concept that Drosophila domains form via mechanisms other than extrusion.

      (2.1) While there are differences between the Hi-C contact profiles in flies and mammals, these differences likely reflect in large part the bin sizes used to visualize contact profiles.  With the exception of Goel et al. (Goel et al. 2023), most of the mammalian Hi-C studies have been low resolution restriction enzyme-based experiments, and required bin sizes of >1 kb or greater to visualize what are labeled as  “TADs.”  In fact, as shown by experiments in Goel et al., these are not actually TADs, but rather a conglomeration of multiple TADs into a series of TAD neighborhoods.  The same is true for the MicroC experiments of Krietenstein et al. and Hsieh et al. on human and mouse tissue culture cells (Hsieh et al. 2020; Krietenstein et al. 2020).  This is shown in Author response image 2.  In this image, we have compared the MicroC profiles generated from human and mouse tissue culture cells with fly MicroC profiles at different levels of resolution.

      For panels A-D, the genomic DNA segments shown are approximately 2.8 Mb, 760 kb, 340 kb, and 190 kb.  For panels E-H, the genomic DNA segments shown are approximately 4.7 Mb, 870 kb, 340 kb and 225 kb.  For panels I-L, the genomic DNA segments shown are approximately 3 Mb, 550 kb, 290 kb and 175 kb.

      As reported for restriction enzyme-based Hi-C experiments, a series of stripes and dots are evident in mammalian MicroC profiles.  In the data from Krietenstein et al., two large TAD “neighborhoods” are evident with a bin size of 5 kb, and these are bracketed by 45o stripes (A: black arrows).  At 1 kb (panel B), the 45o stripe bordering the neighborhood on the left no longer defines the edge of the neighborhood (blue arrow: panel B), and both stripes become discontinuous (fuzzy dots).  At 500 (panel C) and 200 bp (panel D) bin sizes, the stripes largely disappear (black arrows) even though they were the most prominent feature in the TAD landscape with large bin sizes.  At 200 bp, the actual TADs (as opposed to the forest) are visible, but weakly populated.  There are no stripes, and only one of the TADs has an obvious “dot” (green asterisk: panel C).

      Author response image 2.

      Mammalian MicroC profiles different bin sizes.

      Large TAD neighborhoods bordered by stripes are also evident in the Hsieh et al. data set in Author response image 2 panels E and F (black arrows in E and F and green arrow in F).  At 400 bp resolution (panel G), the narrow stripe in panel F (black arrows) becomes much broader, indicating that it is likely generated by interactions across one or two small TADs that can be discerned at 200 bp resolution.  The same is true for the broad stripe indicated by the green arrows in panels F, G and H.  This stripe arises from contacts between the TADs indicated by the red bar in panels G and H and the TADs to the other side of the volcano triangle with a plume (blue arrow in panel H).  As in flies, we would expect that this volcano triangle topped by a plume corresponds to a stem-loop.  However, the resolution is poor at 200 bp, and the profiles of the neighboring TADs are not very distinct.

      For the fly data set, stripes can be discerned when analyzed at 800 bp resolution (see arrows in Author response image 3);  however, these stripes are flanked by regions of lower contact, and represent TAD-TAD interactions.  At 400 bp, smaller neighborhoods can be discerned, and these neighborhoods exhibit a complex pattern of interaction with adjacent neighborhoods.  With bin sizes of 200 bp, individual TADs are observed, as are TAD-TAD interactions like those seen near eve.  Some of the TADs have dots at their apex, while others do not—much like what is seen in the mammalian MicroC studies.

      Author response image 3.

      Mammalian MicroC profiles different bin sizes.

      Stripes: As illustrated in Author response image 2 A-D and E-H, the continuous stripes seen in low resolution mammalian studies (>1 kb bins) would appear to arise from binning artefacts.  At high resolution where single TADs are visible, the stripes seem to be generated by TAD-TAD interactions, and not by some type of “extrusion” mechanism.  This is most clearly seen for the volcano with plume TAD in Author response inage 2 G and H.  While stripes in Author response image 2 disappear at high resolution, this is not always true.  There are stripes that appear to be “real” in Geol et al. 2023 for the TADs in the Ppm1g region, and in Author response image 1 for the Abd-B regulatory domain TADs.  Since the stripes in the Ppm1g region are unaffected by Rad21 depletion, some other mechanism must be involved (c.f. (Shidlovskii et al. 2021)).

      Dots: The high resolution images of mammalian MicroC experiments in Author response image 2D and H show that, like Drosophila (Author response image 3L), mammalian TADs don’t always have a “dot” at the apex of the triangle.  This is not surprising.  In the MicroC procedure, fixed chromatin is digested to mononucleosomes with MNase.  Since most TAD boundaries in flies, and presumably also in mammals, are relatively large (150-400 bp) nuclease hypersensitive regions, extensive MNase digestion will typically reduce the boundary element sequences to oligonucleotides.

      In flies, the only known sequences (at least to date) that end up giving dots (like those seen in Author response image 1) are bound by a large (>1,000 kd) GAF-containing multiprotein complex called LBC.  In the Abd-B region of BX-C, LBC binds to two ~180 bp sequences in Fab-7 (dHS1 and HS3: (Kyrchanova et al. 2018; Wolle et al. 2015), and to the centromere proximal (CP) side of Fab-8.  The LBC elements in Fab-7 (dHS1) and Fab-8 (CP) have both blocking and boundary bypass activity (Kyrchanova et al. 2023; Kyrchanova et al. 2019a; Kyrchanova et al. 2019b; Postika et al. 2018).  Elsewhere, LBC binds to the bx and bxd PREs in the Ubx regulatory domains, to two PREs upstream of engrailed, to the hsp70 promoter, the histone H3-H4 promoters, and the eve promoter (unpublished data).  Based on ChIP signatures, it likely binds to most PREs/tethering elements in the fly genome (Batut et al. 2022; Li et al. 2023).  Indirect end-labeling experiments (Galloni et al. 1993; Samal et al. 1981; Udvardy and Schedl 1984) indicate that LBC protects an ~150-180 bp DNA segment from MNase digestion, which would explain why LBC-bound sequences are able to generate dots in MicroC experiments.  Also unlike typical boundary elements, the pairing interactions of the LBC elements we’ve tested appear to be orientation-independent (unpublished data).

      The difference in MNase sensitivity between typical TAD boundaries and LBC-bound elements is illustrated in the MicroC of the Leukocyte-antigen-related-like (Lar) meta-loop in Author response image 4 panels A and B.  Direct physical pairing of two TAD boundaries (blue and purple) brings two TADs encompassing the 125 kb lar gene into contact with two TADs in a gene poor region 620 kb away.  This interaction generates two regions of greatly enhanced contact: the two boxes on either side of the paired boundaries (panel A).  Note that like transgene homie pairing with the eve boundaries, the boundary pairing interaction that forms the lar meta-loop is orientation-dependent.  In this case the TAD boundary in the Lar locus pairs with the TAD boundary in the gene poor region head-to-head (arrow tip to arrow tip), generating a circle-loop.  This circle-loop configuration brings the TAD upstream of the blue boundary into contact with the TAD upstream of the purple boundary.  Likewise, the TAD downstream of the blue boundary is brought into contact with the TAD downstream of the purple boundary.

      In the MicroC procedure, the sequences that correspond to the paired boundaries are not recovered (red arrow in Author response image 4 panel B).  This is why there are vertical and horizontal blank stripes (red arrowheads) emanating from the missing point of contact.  Using a different HiC procedure (dHS-C) that allows us to recover sequences from typical boundary elements (Author response image 4 panels C and D), there is a strong “dot” at the point of contact which corresponds to the pairing of the blue and purple boundaries.

      There is a second dot (green arrow) within the box that represents physical contacts between sequences in the TADs downstream of the blue and purple boundaries.  This dot is resistant to MNase digestion and is visible both in the MicroC and dHS-C profiles.  Based on the ChIP signature of the corresponding elements in the two TADs downstream of the blue and purple boundaries, this dot represents paired LBC elements.

      Author response image 4.

      Lar metaloop. Panels A & bB: MicroC. Panels C & D: dHS-C

      That being said, the authors' analyses do not distinguish between the formation and the maintenance of domains. It is not clear to this reviewer why a single mechanism should explain the formation of the complex structures observed in static Hi-C heatmaps from a population of cells at a single developmental time point. For example, how can the authors rule out that extrusion initially provides the necessary proximity and possibly the cis preference of contacts required for boundaryboundary pairing whereas the latter may more reflect the structures observed at maintenance?

      (2.2) The MicroC profiles shown in Fig. 2 of our paper were generated from nuclear cycle (NC) 14 embryos.  NC14 is the last nuclear cycle before cellularization (Foe 1989).  After the nuclei exit mitosis, S-phase begins, and because satellite sequences are late replicating in this nuclear cycle, S phase lasts 50 min instead of only 4-6 min during earlier cycles (Shermoen et al. 2010).  So unlike MicroC studies in mammals, our analysis of chromatin architecture in NC14 embryos likely offers the best opportunity to detect any intermediates that are generated during TAD formation.  In particular, we should be able to observe evidence of cohesin linking the sequences from the two extruding strands together (the stripes) as it generates TADs de novo.  However, there are no vertical stripes in the eve TAD as would be expected if cohesin entered at a few specific sites somewhere within the TAD and extruded loops in opposite directions synchronously, nor are their stripes at 45o as would be expected if it started at nhomie or homie (see Figure Supplemental 1).  We also do not detect cohesin-generated stripes in any of the TADs in between eve and the attP site at -142 kb. Note that in some models, cohesin is thought to be continuously extruding loops. After hitting the CTCF roadblocks, cohesin either falls off after a short period and starts again or it breaks through one or more TAD boundaries generating the LDC domains. In this dynamic model, stripes of crosslinked DNA generated by the passing cohesin complex should be observed throughout the cell cycle.  They are not. 

      As for formation versus maintenance, and the possible involvement of cohesin loop extrusion in the former, but not the latter:  This question was indirectly addressed in point #1.2 above.  In this point we described multiple examples of specific boundary:boundary pairing interactions that take place over Mbs, in cis and in trans and even between different chromosomes.  These long-distance interactions don’t preexist;  instead they must be established de novo and then maintained.  This process was actually visualized in the studies of Lim et al. (Lim et al. 2018) on the establishment of trans boundary pairing interactions in NC14 embryos.  There is no conceivable mechanism by which cohesin-based loop extrusion could establish the long or short distance trans interactions that have been documented in many studies on fly boundary elements.  Also as noted above, its seems unlikely that it is necessary for long-range interactions in cis.  

      A more plausible scenario is that cohesin entrapment helps to stabilize these long-distance interactions after they are formed.  If this were true, then one could argue that cohesin might also function to maintain TADs after boundaries have physically paired with their neighbors in cis.  However, the Rad21 depletion experiments of Goel et al. (Goel et al. 2023) would rule out an essential role for cohesin in maintaining TADs after boundary:boundary pairing.  In short, while we cannot formally rule out that loop extrusion might help bring sequences closer together to increase their chance of pairing, neither the specificity of that pairing, nor its orientation can be explained by loop extrusion.  Furthermore, since pairing in trans cannot be facilitated by loop extrusion, invoking it as potentially important for boundary-boundary pairing in cis can only be described as a potential mechanism in search of a function, without clear evidence in its favor.

      On the other hand, the apparent loss of contacts between TADs within large multi-TAD neighborhoods (Geol et al. 2023) would suggest that there is some sort of decompaction of neighborhoods after Rad21 depletion.  It is possible that this might stress interactions that span multiple TADs as is the case for homie at -142, or for the other examples described in #1.2 above.  This kind of involvement of cohesin might or might not be associated with a loop extrusion mechanism.

      Future work aimed at analyzing micro-C data in cohesin-depleted cells might shed additional light on this.

      (2.3) This experiment has been done by Goel et al. (Goel et al. 2023) in mammalian tissue culture cells.  They found that TADs, as well as local TAD neighborhoods, are not disrupted/altered by Rad21 depletion (see Geol at al. 2023 and our response to point #1.1 of reviewer #1).

      Additional mechanisms at play include compartment-level interactions driven by chromatin states. Indeed, in mammalian cells, these interactions often manifest as a "plume" on Hi-C maps similar to what the authors attribute to boundary interactions in this manuscript. How do the chromatin states in the neighboring domains of the eve locus impact the model if at all?

      (2.4) Chromatin states have been implicated in driving compartment level interactions. 

      Compartments as initially described were large, often Mb sized, chromosomal segments that “share” similar chromatin marks/states, and are thought to merge via co-polymer segregation.  They were visualized using large multi-kb bin sizes.  In the studies reported here, we use bin sizes of 200 bp to examine a DNA segment of less than 200 kb which is subdivided into a dozen or so small TADs.  Several of the TADs contain more than one transcription unit, and they are expressed in quite different patterns, and thus might be expected to have different “chromatin states” at different points in development and in different cells in the organism. However, as can be seen by comparing the MicroC patterns in our paper that are shown in Fig. 2 with Fig. 7, Figure Supplemental 5 and Figure Supplemental 6, the TAD organization in NC14 and 12-16 hr embryos is for the most part quite similar.  There is no indication that these small TADs are participating in liquid phase compartmentalization that depends upon shared chromatin/transcriptional states in NC14 and then again in 12-16 hr embryos. 

      In NC14 embryos, eve is expressed in 7 stripes, while it is potentially active throughout much of the embryo.  In fact, the initial pattern in early cycles is quite broad and is then refined during NC14.  In 12-16 hr embryos, the eve gene is silenced by the PcG system in all but a few cells in the embryo.  However, here again the basic structure of the TAD, including the volcano plume, looks quite similar at these different developmental stages.  

      As for the suggestion that the plume topping the eve volcano triangle is generated because the TADs flanking the eve TAD share chromatin states and coalesce via some sort of phase separation:

      This model has been tested directly in Ke et al. (Ke et al. 2024).  In Ke et al., we deleted the nhomie boundary and replaced it with either nhomie in the reverse orientation or homie in the forward orientation.  According to the compartment model, changing the orientation of the boundaries so that the topology of the eve TAD changes from a stem-loop to a circle-loop should have absolutely no effect on the plume topping the eve volcano triangle.  The TADs flanking the eve TAD would still be expected to share the same chromatin states and would still be able to coalesce via phase transition.  However, this is not what is observed.  The plume disappears and is replaced by “clouds” on both sides of the eve TAD. The clouds arise because the eve TAD bumps into the neighboring TADs when the topology is a circle-loop.  

      We would also note that “compartment-level” interactions would not explain the findings presented in Muller at al. 1999, in Table 1 or in Author response image 4.  It is clear that the long distant (Mb) interactions observed for Mcp, gypsy, Fab-7, homie, nhomie and the blue and purple boundaries in Author response image 4 arise by the physical pairing of TAD boundary elements.  This fact is demonstrated directly by the MicroC experiments in Fig. 7 and Fig Supplemental 4 and 5, and by the MicroC and dHS-C experiments in Author response image 4.  There is no evidence for any type of “compartment/phase separation” driving these specific boundary pairing interactions.

      In fact, given the involvement of TAD boundaries in meta-loop formation, one might begin to wonder whether some of the “compartment level interactions” are generated by the specific pairing of TAD boundary elements rather than by “shared chromatin” states.  For example, the head-tohead pairing of the blue and purple boundaries generates a Lar meta-loop that has a circle-loop topology.  As a consequence, sequences upstream of the blue and purple boundary come into contact, generating the small dark rectangular box on the upper left side of the contact map.  Sequences downstream of the blue and purple boundary also come into contact, and this generates the larger rectangular box in the lower right side of the contact map.  A new figure, Fig. 9, shows that the interaction pattern flips (lower left and top right) when the meta-loop has a stem-loop topology.  If these meta-loops are visualized using larger bin sizes, the classic “compartment” patchwork pattern of interactions emerges.  Would the precise patchwork pattern of “compartmental” interactions involving the four distant TADs that are linked in the two meta-loops shown in Fig. 9 persist as is if we deleted one of the TAD boundaries that forms the meta-loop?  Would the precise patchwork pattern persist if we inverted one of the meta-loop boundaries so that we converted the topology of the loop from a circle-loop to a stem-loop or vice versa?  We haven’t used MicroC to compare the compartment organization after deleting or inverting a meta-loop TAD boundary; however, a comparison of the MicroC pattern in WT in Fig. 1C with that for the homie transgenes in Fig. 7 and Figs. Supplemental 5, 6 and 7 indicates a) that novel patterns of TAD:TAD interactions are generated by this homie dependent mini-meta-loop and b) that the patterns of TAD:TAD interactions depend upon loop topology. Were these novel TAD:TAD interactions generated instead by compartment level interactions/shared chromatin states, they should be evident in WT as well (Fig. 1).  They are not.

      How does intrachromosomal homolog pairing impact the models proposed in this manuscript (Abed et al. 2019; Erceg et al., 2019). Several papers recently have shown that somatic homolog pairing is not uniform and shows significant variation across the genome with evidence for both tight pairing regions and loose pairing regions. Might loose pairing interactions have the capacity to alter the cis configuration of the eve locus?

      (2.5) At this point it is not entirely clear how homolog pairing impacts the cis configuration/MicroC contact maps.  We expect that homolog pairing is incomplete in the NC14 embryos we analyzed;  however, since replication of eve and the local neighborhood is likely complete, sister chromosomes should be paired.  So we are likely visualizing the 3D organization of paired TADs.

      In summary, the transgenic experiments are extensive and elegant and fully support the authors' models. However, in my opinion, they do not completely rule out additional models at play, including extrusion-based mechanisms. Indeed, my major issue is the limited conceptual advance in this manuscript. The authors essentially repeat many of their previous work and analyses.

      (2.6) In our view, the current paper makes a number of significant contributions that go well beyond those described in our 2016 publication.  These are summarized below.

      A) While our 2016 paper used transgenes inserted in the -142 kb attP site to study pairing interactions of homie and nhomie, we didn’t either consider or discuss how our findings might bear on the loop extrusion model.  However, since the loop extrusion model is currently accepted as established fact by many labs working on chromosome structure, it is critically important to devise experimental approaches which test the predictions of this particular model.  One approach would be to deplete cohesin components; however, as discussed in #1.1, our experimental system is not ideal for this type of approach.  On the other hand, there are other ways to test the extrusion model.  Given the mechanism proposed for TAD formation—extruding a loop until cohesin bumps into CTCF/boundary road blocks—it follows that only two types of loop topologies are possible: stemloop and unanchored loop.  The loop extrusion model, as currently conceived, can’t account for the two cases in this study in which the reporter on the wrong side of the homie boundary from the eve locus is activated by the eve enhancers.  In contrast, our findings are completely consistent with orientation-specific boundary:boundary pairing.

      B) In the loop extrusion model, cohesin embraces both of the extruded chromatin fibers, transiently bringing them into close proximity.  As far as we know, there have been no (high resolution) experiments that have actually detected these extruding cohesin complexes during TAD formation.  In order to have a chance of observing the expected signatures of extruding cohesin complexes, one would need a system in which TADs are being formed.  As described in the text, this is why we used MicroC to analyze TADs in NC14 embryos.  We do not detect the signature stripes that would be predicted (see Figure Supp 2) by the current version of the loop extrusion model.

      C) Reporter expression in the different -142 kb transgenes provides only an indirect test of the loop extrusion and boundary:boundary pairing models for TAD formation.  The reporter expression results need to be confirmed by directly analyzing the pattern of physical interactions in each instance.  While we were able to detect contacts between the transgenes and eve in our 2016 paper, the 3C experiments provided no information beyond that.  By contrast, the MicroC experiments in the current paper give high resolution maps of the physical contacts between the transgene and the eve TAD.  The physical contacts track completely with reporter activity.  Moreover, just as is the case for reporter activity, the observed physical interactions are inconsistent with the loop extrusion model.

      D) Genetic studies in Muller et al. (Muller et al. 1999) and imaging in Vazquez et al. (Vazquez et al. 2006) suggested that more than two boundaries can participate in pairing interactions.  Consistent with these earlier observations, viewpoint analysis indicates the transgene homie interacts with both eve boundaries.  While this could be explained by transgene homie alternating between nhomie and homie in the eve locus, this would require the remodeling of the eve TAD each time the pairing interaction switched between the three boundary elements.  Moreover, two out of the three possible pairing combinations would disrupt the eve TAD, generating an unanchored loop (c.f., the lambda DNA TAD in Ke et al., (Ke et al. 2024)).  However, the MicroC profile of the eve TAD is unaffected by transgenes carrying the homie boundary.  This would suggest that like Mcp, the pairing interactions of homie and nhomie might not be exclusively pairwise.  In this context is interesting to compare the contact profiles of the lar meta-loop shown in Author response image 4 with the different 142 kb homie inserts.  Unlike the homie element at -142 kb, there is clearly only a single point of contact between the blue and purple boundaries.

      E) Chen et al. (Chen et al. 2018) used live imaging to link physical interactions between a homie containing transgene inserted at -142 kb and the eve locus to reporter activation by the eve enhancers.  They found that the reporter was activated by the eve enhancers only when it was in “close proximity” to the eve gene.  “Close proximity” in this case was 331 nM.  This distance is equivalent to ~1.1 kb of linear duplex B form DNA, or ~30 nucleosome core particles lined up in a row.  It would not be possible to ligate two DNAs wrapped around nucleosome core particles that are located 330 nM apart in a fixed matrix.  Since our MicroC experiments were done on embryos in which the gene is silent in the vast majority of cells, it is possible that the homie transgene only comes into close enough proximity for transgene nucleosome: eve nucleosome ligation events when the eve gene is off.  Alternatively, and clearly more likely, distance measurements using imaging procedures that require dozens of fluorescent probes may artificially inflate the distance between sequences that are actually close enough for enzymatic ligation.

      F) The findings reported in Goel et al. (Goel et al. 2023) indicate that mammalian TADs don’t require cohesin activity; however, the authors do not provide an alternative mechanism for TAD formation/stability.  Here we have suggested a plausible mechanism.

      The authors make no attempt to dissect the mechanism of this process by modifying extrusion components directly.

      (2.7) See point #1.1

      Some discussion of Rollins et al. on the discovery of Nipped-B and its role in enhancer-promoter communication should also be made to reconcile their conclusions in the proposed absence of extrusion events.

      (2.8) The reason why reducing nipped-B activity enhances the phenotypic effects of gypsy-induced mutations is not known at this point; however, the findings reported in Rollins et al. (Rollins et al. 1999) would appear to argue against an extrusion mechanism for TAD formation.

      Given what we know about enhancer blocking and TADs, there are two plausible mechanisms for how the Su(Hw) element in the gypsy transposon blocks enhancer-promoter interactions in the gypsy-induced mutants studied by Rollins et al.  First, the Su(Hw) element could generate two new TADs through pairing interactions with boundaries in the immediate neighborhood.  This would place the enhancers in one TAD and the target gene in another TAD.  Alternatively, the studies of Sigrist and Pirrotta (Sigrist and Pirrotta 1997) as well as several publications from Victor Corces’ lab raise the possibility that the Su(Hw) element in gypsy-induced mutations is pairing with gypsy transposons inserted elsewhere in the genome.  This would also isolate enhancers from their target genes.  In either case, the loss of nipped-B activity increases the mutagenic effects of Su(Hw) element presumably by strengthening its boundary function.  If this is due to a failure to load cohesin on to chromatin, this would suggest that cohesin normally functions to weaken the boundary activity of the Su(Hw) element, i.e., disrupting the ability of Su(Hw) elements to interact with either other boundaries in the neighborhood or with themselves.  Were this a general activity of cohesin (to weaken boundary activity), one would imagine that cohesin normally functions to disrupt TADs rather than generate/stabilize TADs.

      An alternative model is that Nipped-B (and thus cohesion) functions to stabilize enhancerpromoter interactions within TADs.  In this case, loss of Nipped-B would result in a destabilization of the weak enhancer:promoter interactions that can still be formed when gypsy is located between the enhancer and promoter.  In this model the loss of these weak interactions in nipped-b mutants would appear to increase the “blocking” activity of the gypsy element.  However, this alternative model would also provide no support for the notion that Nipped-B and cohesin function to promote TAD formation.

      Reviewer #3 (Public Review):

      Bing et al. attempt to address fundamental mechanisms of TAD formation in Drosophila by analyzing gene expression and 3D conformation within the vicinity of the eve TAD after insertion of a transgene harboring a Homie insulator sequence 142 kb away in different orientations. These transgenes along with spatial gene expression analysis were previously published in Fujioka et al. 2016, and the underlying interpretations regarding resulting DNA configuration in this genomic region were also previously published. This manuscript repeats the expression analysis using smFISH probes in order to achieve more quantitative analysis, but the main results are the same as previously published. The only new data are the Micro-C and an additional modeling/analysis of what they refer to as the 'Z3' orientation of the transgenes. The rest of the manuscript merely synthesizes further interpretation with the goal of addressing whether loop extrusion may be occurring or if boundary:boundary pairing without loop extrusion is responsible for TAD formation. The authors conclude that their results are more consistent with boundary:boundary pairing and not loop extrusion; however, most of this imaging data seems to support both loop extrusion and the boundary:boundary models. This manuscript lacks support, especially new data, for its conclusions.

      (3.1) The new results/contributions of our paper are described in #2.6 above. 

      Although there are (two) homie transgene configurations that give expression patterns that would be consistent with the loop extrusion model, that is not quite the same as strong evidence supporting loop extrusion.  On the contrary, key aspects of the expression data are entirely inconsistent with loop extrusion, and they thus rule out the possibility that loop extrusion is sufficient to explain the results.  Moreover, the conclusions drawn from the expression patterns of the four transgenes are back up by the MicroC contact profiles—profiles that are also not consistent with the loop extrusion model.  Further, as documented above, loop extrusion is not only unable to explain the findings reported in this manuscript, but also the results from a large collection of published studies on fly boundaries.  Since all of these boundaries function in TAD formation, there is little reason to think that loop extrusion makes a significant contribution at the TAD level in flies.   Given the results reported by Goel et al. (Goel et al. 2023), one might also have doubts about the role of loop extrusion in the formation/maintenance of mammalian TADs. 

      To further document these points, we’ve included a new figure (Fig. 9) that shows two meta-loops.  Like the loops seen for homie-containing transgenes inserted at -142 kb, meta-loops are formed by the pairing of distant fly boundaries.  As only two boundaries are involved, the resulting loop topologies are simpler than those generated when transgene homie pairs with nhomie and homie in the eve locus.  The meta-loop in panel B is a stem-loop.  While a loop with this topology could be formed by loop extrusion, cohesion would have to break through dozens of intervening TAD boundaries and then somehow know to come to a halt at the blue boundary on the left and the purple boundary on the right.  However, none of the mechanistic studies on either cohesin or the mammalian CTCF roadblocks have uncovered activities of either the cohesin complex or the CTCF roadblocks that could explain how cohesin would be able to extrude hundreds of kb and ignore dozens of intervening roadblocks, and then stop only when it encounters the two boundaries that form the beat-IV meta-loop.  The meta-loop in panel A is even more problematic in that it is a circle-loop--a topology that can’t be generated by cohesin extruding a loop until comes into contact with CTCF roadblocks on the extruded strands.

      Furthermore, there are many parts of the manuscript that are difficult to follow. There are some minor errors in the labelling of the figures that if fixed would help elevate understanding. Lastly, there are several major points that if elaborated on, would potentially be helpful for the clarity of the manuscript.

      Major Points:

      (1) The authors suggest and attempt to visualize in the supplemental figures, that loop extrusion mechanisms would appear during crosslinking and show as vertical stripes in the micro-C data. In order to see stripes, a majority of the nuclei would need to undergo loop extrusion at the same rate, starting from exactly the same spots, and the loops would also have to be released and restarted at the same rate. If these patterns truly result from loop extrusion, the authors should provide experimental evidence from another organism undergoing loop extrusion.

      (3.2) We don’t know of any reports that actually document cohesion extrusion events that are forming TADs (TADs as defined in our paper, in the RCMC experiments of Goel et al. (Goel et al. 2023), in response #1.1, or in the high-resolution images from the MicroC data of Krietenstein et al (Krietenstein et al. 2020) and Hseih et al. (Hsieh et al. 2020). However, an extruding cohesin complex would be expected to generate stripes because it transiently brings together the two chromatin strands as illustrated by the broken zipper in Figure Supplemental 2 of our paper.  While stripes generated by cohesin forming a TAD have not to our knowledge ever been observed, Fig. 4 in Goel et al. (Goel et al. 2023)) shows 45o stripes outlining TADs and connecting neighboring TADs.  These stripes are visible with or without Rad21.

      In some versions of the loop extrusion model, cohesin extrudes a loop until it comes to a halt at both boundaries, where it then remains holding the loop together.  In this model, the extrusion event would occur only once per cell cycle.  This is reason we selected NC14 embryos as this point in development should provide by far the best opportunity to visualize cohesin-dependent TAD formation.  However, the expected stripes generated by cohesin embrace of both strands of the extruding loop were not evident.  Other newer versions of the loop extrusion model are much more dynamic—cohesin extrudes the loop, coming to a halt at the two boundaries, but either doesn’t remain stably bound or breaks through one or both boundaries. In the former case, the TAD needs to be reestablished by another extrusion event, while in the latter case LDC domains are generated.  In this dynamic model, we should also be able to observe vertical and 45o stripes (or stripes leaning to one side or another of the loading site if the extrusion rates aren’t equal on both fibers) in NC14 embryos corresponding to the formation of TADs and LDC domains.  However, we don’t.

      (2) On lines 311-314, the authors discuss that stem-loops generated by cohesin extrusion would possibly be expected to have more next-next-door neighbor contacts than next-door neighbor contacts and site their models in Figure 1. Based on the boundary:boundary pairing models in the same figure would the stem-loops created by head-to-tail pairing also have the same phenotype? Making possible enrichment of next-next-door neighbor contacts possible in both situations? The concepts in the text are not clear, and the diagrams are not well-labeled relative to the two models.

      (3.3) Yes, we expect that stem-loops formed by cohesin extrusion or head-to-tail pairing would behave in a similar manner.  They could be stem-loops separated by unanchored loops as shown in Fig. 1B and E.  Alternatively, adjacent loops could be anchored to each other (by cohesin/CTCF road blocks or by pairing interactions) as indicated in Fig. 1C and F.  In stem-loops generated either by cohesin extrusion or by head-to-tail pairing, next-next door neighbors should interact with each other, generating a plume above the volcano triangle.  In the case of circle-loops, the volcano triangle should be flanked by clouds that are generated when the TAD bumps into both next-door neighbors.  In the accompanying paper, we test this idea by deleting the nhomie boundary and then a) inserting nhomie back in the reverse orientation, or b) by inserting homie in the forward orientation.  The MicroC patterns fit with the predictions that were made in this paper.

      (3) The authors appear to cite Chen et al., 2018 as a reference for the location of these transgenes being 700nM away in a majority of the nuclei. However, the exact transgenes in this manuscript do not appear to have been measured for distance. The authors could do this experiment and include expression measurements.

      (3.4) The transgenes used in Chen et al. are modified versions of a transgene used in Fujioka et al. (2016) inserted into the same attP site.  When we visualize reporter transcription in NC14 embryos driven by the eve enhancers using smFISH, HCR-FISH or DIG, only a subset of the nuclei at this stage are active.  The number of active nuclei we detect is similar to that observed in the live imaging experiments of Chen et al.  The reason we cited Chen et al. (Chen et al. 2018) was that they found that proximity was a critical factor in determining whether the reporter was activated or not in a given nucleus.  The actual distance they measured wasn’t important.  Moreover, as we discussed in response #2.6 above, there are good reasons to think that the “precise” distances measured in live imaging experiments like those used in Chen et al. are incorrect.  However, their statements are certainly correct if one considers that a distance of ~700 nM or so is “more distant” relative to a distance of ~300 nM or so, which is “closer.”

      (4) The authors discuss the possible importance of CTCF orientation in forming the roadblock to cohesin extrusion and discuss that Homie orientation in the transgene may impact Homie function as an effective roadblock. However, the Homie region inserted in the transgene does not contain the CTCF motif. Can the authors elaborate on why they feel the orientation of Homie is important in its ability to function as a roadblock if the CTCF motif is not present? Trans-acting factors responsible for Homie function have not been identified and this point is not discussed in the manuscript.

      We discussed the “importance” of CTCF orientation in forming roadblocks because one popular version of the cohesin loop extrusion/CTCF roadblock model postulates that CTCF must be oriented so that the N-terminus of the protein is facing towards the oncoming cohesin complex, otherwise it won’t be able to halt extrusion on that strand.  When homie in the transgene is pointing towards the eve locus, the reporter on the other side (farther from eve) is activated by the eve enhancers.  One possible way to explain this finding (if one believes the loop extrusion model) is that when homie is inverted, it can’t stop the oncoming cohesin complex, and it runs past the homie boundary until it comes to a stop at a properly oriented boundary farther away.  In this case, the newly formed loop would extend from the boundary that stopped cohesin to the homie boundary in the eve locus, and would include not only the distal reporter, but also the proximal reporter.  If both reporters are in the same loop with the eve enhancers (which they would have to be given the mechanism of TAD formation by loop extrusion), both reporters should be activated.  They are not.

      For the boundary pairing model, the reporter that will be activated will depend upon the orientation of the pairing interaction—which can be either head-to-head or head-to-tail (or both: see discussion of LBC elements in #2.1).  For an easy visualization of how the orientation of pairing interactions is connected to the patterns of interactions between sequences neighboring the boundary, please look at Fig. 9.  This figure shows two different meta-loops.  In panel A, head-tohead pairing of the blue and purple boundaries brings together, on the one hand, sequences upstream of the blue and purple boundary, and on the other hand, sequences downstream of the blue and purple boundaries.  In the circle loop configuration, the resulting rectangular boxes of enhanced contact are located in the upper left and lower right of the contact map.  In panel B, the head-to-tail pairing of the blue and purple boundary changes how sequences upstream and downstream of the blue and purple boundaries interact with each other.  Sequences upstream of the blue boundary interact with sequences downstream of the purple boundary, and this gives the rectangular box of enhanced interactions on the top right.  Sequences downstream of the blue boundary interact with sequences upstream of the purple boundary, and this gives the rectangular box of enhanced contact on the lower left.

      CTCF: Our analysis of the homie boundary suggests that CTCF contributes little to its activity.  It has an Su(Hw) recognition sequence and a CP190 “associated” sequence.  Mutations in both compromise boundary activity (blocking and -142 kb pairing).  Gel shift experiments and ChIP data indicate there are half a dozen or more additional proteins that associate with the 300 bp homie fragment used in our experiments.

      Orientation of CTCF or other protein binding sites:  The available evidence suggests that orientation of the individual binding sites is not important (Kyrchanova et al. 2016; Lim et al. 2018)).  Instead, it is likely that the order of binding sites affects function.

      (5) The imaging results seem to be consistent with both boundary:boundary interaction and loop extrusion stem looping.

      It is not clear whether the reviewer is referring to the different patterns of reporter expression— which clearly don’t fit with the loop extrusion model in the key cases that distinguish the two models—or the live imaging experiments in Chen et al. (Chen et al. 2018).

      (6) The authors suggest that the eveMa TAD could only be formed by extrusion after the breakthrough of Nhomie and several other roadblocks. Additionally, the overall long-range interactions with Nhomie appear to be less than the interactions with endogenous Homie (Figures 7, 8, and supplemental 5). Is it possible that in some cases boundary:boundary pairing is occurring between only the transgenic Homie and endogenous Homie and not including Nhomie?

      Yes, it is possible.  On the other hand, the data that are currently available supports the idea that transgene homie usually interacts with endogenous homie and nhomie at the same time.  This is discussed in #2.6D above.  The viewpoints indicate that crosslinking occurs more frequently to homie than to nhomie.  This could indicate that when there are only pairwise interactions, these tend to be between homie and homie.  Alternatively, this could also be explained by a difference in relative crosslinking efficiency.

      (7) In Figure 4E, the GFP hebe expression shown in the LhomieG Z5 transgenic embryo does not appear in the same locations as the LlambdaG Z5 control. Is this actually hebe expression or just a background signal?

      The late-stage embryos shown in E are oriented differently.  For GlambdaL, the embryo is oriented so that hebe-like reporter expression on the ventral midline is readily evident.  However, this orientation is not suitable for visualizing eve enhancer-dependent expression of the reporters in muscle progenitor cells.  For this reason, the 12-16 hr GeimohL embryo in E is turned so that the ventral midline isn’t readily visible in most of the embryo.  As is the case in NC14 embyros, the eve enhancers drive lacZ but not gfp expression in the muscle progenitor cells.

      (8) Figure 6- The LhomieG Z3 (LeimohG) late-stage embryo appears to be showing the ventral orientation of the embryo rather than the lateral side of the embryo as was shown in the previous figure. Is this for a reason? Additionally, there are no statistics shown for the Z3 transgenic images.

      Were these images analyzed in the same way as the Z5 line images?

      The LeimohG embryo was turned so that the hebe enhancer-dependent expression of lacZ is visible.  While the eve enhancer-dependent expression of lacZ in the muscle progenitor cells isn’t visible with this orientation, eve enhancer-dependent expression in the anal plate is.

      (9) Do the Micro-C data align with the developmental time points used in the smFISH probe assays?

      The MicroC data aligns with the smFISH images of older embryos: 12-14 hour embryos or stages 14-16.  

      Recommendations for the authors:   

      Reviewer #1 (Recommendations For The Authors):

      This was a difficult paper to review. It took me several hours to understand the terminology and back and forth between different figures to put it together. It might be useful to put the loop models next to the MicroC results and have a cartoon way of incorporating which enhancers are turning on which reporters.

      I also found the supercoiled TAD models in Figure 1 not useful. These plectoneme-type of structures likely do not exist, based on the single-cell chromosome tracing studies, and the HiC structures not showing perpendicular to diagonal interactions between the arms of the plectonemes.

      We wanted to represent the TAD as a coiled 30nM fiber, as they are not likely to resemble the large loops like those shown in Fig. 1 A, D, and G.

      There are no stripes emerging from homies, which is consistent with the pairing model, but there seem to be stripes from the eve promoter. I think these structures may be a result of both the underlying loop extruders + pairing elements.

      There are internal structures in the eve TAD that link the upstream region of the eve promoter to the eve PRE and sequences in nhomie.  All three of these sequences are bound by LBC.  Each of the regulatory domains in BX-C also have LBC elements and, as shown in Author response image 1, you can see stripes connecting some of these LBC elements to each other.  Since the stripes that Goel et al. (Goel et al. 2023) observed in their RCMC analysis of Ppm1g didn’t require cohesin, how these stripes are generated (active: e.g, a chromatin remodeler or passive: e.g., the LBC complex has non-specific DNA binding activity that can be readily crosslinked as the chromatin fiber slides past) isn’t clear.

      The authors say there are no TADs that have "volcano plumes" but the leftmost TAD TA appears to have one. What are the criteria for calling the plumes? I am also not clear why there is a stripe off the eve volcano. It looks like homie is making a "stripe" loop extrusion type of interaction with the next TAD up. Is this maybe cohesin sliding off the left boundary?

      The reviewer is correct, the left-most TAD TA appears to have a plume.  We mentioned TA seems to have a plume in the original text, but it was inadvertently edited out.

      Two different types of TADßàTAD interactions are observed.  In the case of eve, the TADs to either side of eve interact more frequently with each other than they do with eve.  This generates a “plume” above the eve volcano triangle.  The TADs that comprise the Abd-B regulatory domains (see Author response image 1) are surrounded by clouds of diminishing intensity.  Clouds at the first level represent interactions with both next-door neighbors; clouds at the second level represent interactions with both next-next-door neighbors; clouds at the third level represent interactions with next-next-next door neighbors.  The Abd-B TADs are close to the same size, so that interactions with neighbors are relatively simple.  However, this is not always the case.  When there are smaller TADs near larger TADs the pattern of interaction can be quite complicated.  An example is indicated by the red bar in Author response image 2

      The authors state "In the loop-extrusion model, a cohesin complex initiating loop extrusion in the eve TAD must break through the nhomie roadblock at the upstream end of the eve TAD. It must then make its way past the boundaries that separate eve from the attP site in the hebe gene, and come to a halt at the homie boundary associated with the lacZ reporter." Having multiple loops formed by cohesin would also bring in the 142kb apart reporter and homie. Does cohesin make 140 kb long loops in flies?

      A mechanism in which cohesin brings the reporter close to the eve TAD by generating many smaller loops (which would be the intervening TADs) was discussed in #1.2.

      Figure 5 title mistakes the transgene used?

      Fixed.

      In figure 6, the orientation of the embryos does not look the same for the late-stage panels. So it was difficult to tell if the eve enhancer was turning the reporter on.

      Here we were focusing mainly on the AP enhancer activation of the reporter, as this is most easily visualized.  It should be clear from the images that the appropriate reporter is activated by the AP enhancer for each of the transgene inserts.

      It is not clear to me why the GFP makes upstream interactions (from the 4C viewpoint) in GhomileLZ5 but not in LhomieGZ5? Corresponding interactions for Fig Supp 5 & 6 are not the same. That is, LacZ in the same place and with the same homie orientation does not show a similar upstream enrichment as the GFP reporter does.

      We are uncertain as to whether we understand this question/comment.  In GhomieLZ5 (now GhomieL, the lacZ reporter is on the eve side of the homie boundary while gfp is on the hebe enhancer side of the homie boundary.  Since homie is pointing away from gfp, pairing interactions with homie and nhomie in the eve locus bring the eve enhancers in close proximity with the gfp reporter.  This is what is seen in Fig. 7 panel D—lower trace.  In LhomieGZ5 (now GeimohL) the lacZ reporter is again on the eve side of the homie boundary while gfp is on the hebe enhancer side of the homie boundary.  However, in this case homie is inverted so that it is points away from lacZ (towards gfp).  In this orientation, pairing brings the lacZ reporter into contact with the eve enhancers.  This is what is seen in the upper trace in Fig. 7 panel D.

      The orientation of the transgene is switch in Fig. Supp 5 and 6.  For these “Z3) transgenes (now called LeimohG and LhomieG the gfp reporter is on the eve side of homie while the lacZ reporter is on the hebe enhancer side of homie.  The interactions between the reporters and eve are determined by the orientation of homie in the transgene.  When homie is pointing away from gfp (as in LeimohG), gfp is activated and that is reflected in the trace in Supp Fig. 5. When homie is pointing away from lacZ, lacZ is activated and this is reflected (though not as cleanly as in other cases) in the trace in Supp Fig. 6.  

      I did not see a data availability statement. Is the data publicly available? The authors also should consider providing the sequences of the insertions, or provide the edited genomes, in case other researchers would like to analyze the data.

      Data have been deposited.

      Reviewer #3 (Recommendations For The Authors):

      Minor Points:

      (1) There is an inconsistency in the way that some of the citations are formatted. Some citations have 'et al' italicized while others do not. It seems to be the same ones throughout the manuscript. Some examples: Chetverina et al 2017, Chetverina et al 2014, Cavalheiro et al 2021, Kyrchanova et al 2008a, Muravyova et al 2001.

      Fixed

      (2) Pita is listed twice in line 48.

      Fixed

      (3) Line 49, mod(mdg4)67.2 is written just as mod(mdg4). The isoform should be indicated.

      This refers to all Mod isoforms.

      (4) Homie and Nhomie are italicized throughout the manuscript and do not need to be.

      This is the convention used previously.  

      (5) The supplemental figure captions 1 and 2 in the main document are ordered differently than in the supplemental figures file. This caused it to look like the figures are being incorrectly cited in lines 212-214 and 231-232.

      Fixed

      (6) Is the correct figure being cited in line 388-389? The line cites Figure 6E when mentioning LlambdaG Z5; however, LlambdaG Z5 is not shown in Figure 6.

      Fixed

      (7) Section heading 'LhomieG Z5 and GhomieL Z5' could be renamed for clarity. GhomieL Z5 results are not mentioned until the next section, named 'GhomieL Z5'.

      Fixed

      (8) Can the authors provide better labeling for control hebe expression? This would help to determine what is hebe expression and what is background noise in some of the embryos in Figures 4-6.

      Author response image 5 shows expression of the lacZ reporter in GeimohL and GlambdaL.  For the GlambdaL transgene, the hebe enhancers drive lacZ expression in 1216 hr embryos.  Note that lacZ expression is restricted to a small set of quite distinctive cells along the ventral midline.  lacZ is also expressed on the ventral side of the GeimohL embryo (top panel).  However, their locations are quite different from those of the lacZ positive cells in the GlambdaL transgene embryo.  These cells are displaced from the midline, and are arranged as pairs of cells in each hemisegment, locations that correspond to eve-expressing cells in the ventral nerve cord.  The eve enhancers also drive lacZ expression elsewhere in the GeimohL embryo, including the anal plate and dorsal muscle progenitor cells (seen most clearly in the lower left panel).

      Author response image 5.

      lacZ expression in Giemohl and Glambdal embryos

      (9) The Figure 5 title is labeled with the wrong transgene.

      Fixed

      (10) Heat map scales are missing for Figures 7, supplemental 5, and supplemental 6.

      Fixed

      (11) Did the authors check if there was a significant difference in the expression of GFP and lacZ from lambda control lines to the Homie transgenic lines?

      Yes.  Statistical analysis added in Table Supplemental #1

      (12) The Figure 7 title references that these are Z3 orientations, however, it is Z5 orientations being shown.

      Fixed

      (13) The virtual 4C data should include an axis along the bottom of the graphs for better clarity. An axis is missing in all 4C figures.

      References:

      Bantignies F, Grimaud C, Lavrov S, Gabut M, Cavalli G. 2003. Inheritance of polycomb-dependent chromosomal interactions in drosophila. Genes Dev. 17(19):2406-2420.

      Batut PJ, Bing XY, Sisco Z, Raimundo J, Levo M, Levine MS. 2022. Genome organization controls transcriptional dynamics during development. Science. 375(6580):566-570.

      Bonchuk A, Boyko K, Fedotova A, Nikolaeva A, Lushchekina S, Khrustaleva A, Popov V, Georgiev P. 2021. Structural basis of diversity and homodimerization specificity of zinc-fingerassociated domains in drosophila. Nucleic Acids Res. 49(4):2375-2389.

      Bonchuk AN, Boyko KM, Nikolaeva AY, Burtseva AD, Popov VO, Georgiev PG. 2022. Structural insights into highly similar spatial organization of zinc-finger associated domains with a very low sequence similarity. Structure. 30(7):1004-1015.e1004.

      Chen H, Levo M, Barinov L, Fujioka M, Jaynes JB, Gregor T. 2018. Dynamic interplay between enhancer–promoter topology and gene activity. Nat Genet. 50(9):1296.

      Fedotova AA, Bonchuk AN, Mogila VA, Georgiev PG. 2017. C2h2 zinc finger proteins: The largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae. 9(2):4758.

      Foe VE. 1989. Mitotic domains reveal early commitment of cells in drosophila embryos. Development. 107(1):1-22.

      Fujioka M, Mistry H, Schedl P, Jaynes JB. 2016. Determinants of chromosome architecture: Insulator pairing in cis and in trans. PLoS Genet. 12(2):e1005889.

      Galloni M, Gyurkovics H, Schedl P, Karch F. 1993. The bluetail transposon: Evidence for independent cis-regulatory domains and domain boundaries in the bithorax complex. The EMBO Journal. 12(3):1087-1097.

      Goel VY, Huseyin MK, Hansen AS. 2023. Region capture micro-c reveals coalescence of enhancers and promoters into nested microcompartments. Nat Genet. 55(6):1048-1056.

      Hsieh TS, Cattoglio C, Slobodyanyuk E, Hansen AS, Rando OJ, Tjian R, Darzacq X. 2020. Resolving the 3d landscape of transcription-linked mammalian chromatin folding. Mol Cell. 78(3):539553.e538.

      Ke W, Fujioka M, Schedl P, Jaynes JB. 2024. Chromosome structure ii: Stem-loops and circle-loops. eLife.

      Krietenstein N, Abraham S, Venev SV, Abdennur N, Gibcus J, Hsieh TS, Parsi KM, Yang L, Maehr R, Mirny LA et al. 2020. Ultrastructural details of mammalian chromosome architecture. Mol Cell. 78(3):554-565.e557.

      Kyrchanova O, Ibragimov A, Postika N, Georgiev P, Schedl P. 2023. Boundary bypass activity in the abdominal-b region of the drosophila bithorax complex is position dependent and regulated. Open Biol. 13(8):230035.

      Kyrchanova O, Kurbidaeva A, Sabirov M, Postika N, Wolle D, Aoki T, Maksimenko O, Mogila V, Schedl P, Georgiev P. 2018. The bithorax complex iab-7 polycomb response element has a novel role in the functioning of the fab-7 chromatin boundary. PLoS Genet. 14(8):e1007442.

      Kyrchanova O, Mogila V, Wolle D, Deshpande G, Parshikov A, Cleard F, Karch F, Schedl P, Georgiev P. 2016. Functional dissection of the blocking and bypass activities of the fab-8 boundary in the drosophila bithorax complex. PLoS Genet. 12(7):e1006188.

      Kyrchanova O, Sabirov M, Mogila V, Kurbidaeva A, Postika N, Maksimenko O, Schedl P, Georgiev P.

      2019a. Complete reconstitution of bypass and blocking functions in a minimal artificial fab7 insulator from drosophila bithorax complex. Proceedings of the National Academy of Sciences.201907190.

      Kyrchanova O, Wolle D, Sabirov M, Kurbidaeva A, Aoki T, Maksimenko O, Kyrchanova M, Georgiev P, Schedl P. 2019b. Distinct elements confer the blocking and bypass functions of the bithorax fab-8 boundary. Genetics.genetics. 302694.302019.

      Li H-B, Muller M, Bahechar IA, Kyrchanova O, Ohno K, Georgiev P, Pirrotta V. 2011. Insulators, not polycomb response elements, are required for long-range interactions between polycomb targets in drosophila melanogaster. Mol Cell Biol. 31(4):616-625.

      Li X, Tang X, Bing X, Catalano C, Li T, Dolsten G, Wu C, Levine M. 2023. Gaga-associated factor fosters loop formation in the drosophila genome. Mol Cell. 83(9):1519-1526.e1514.

      Lim B, Heist T, Levine M, Fukaya T. 2018. Visualization of transvection in living drosophila embryos. Mol Cell. 70(2):287-296. e286.

      Link N, Kurtz P, O'Neal M, Garcia-Hughes G, Abrams JM. 2013. A p53 enhancer region regulates target genes through chromatin conformations in cis and in trans. Genes Dev. 27(22):24332438.

      Mohana G, Dorier J, Li X, Mouginot M, Smith RC, Malek H, Leleu M, Rodriguez D, Khadka J, Rosa P et al. 2023. Chromosome-level organization of the regulatory genome in the drosophila nervous system. Cell. 186(18):3826-3844.e3826.

      Muller M, Hagstrom K, Gyurkovics H, Pirrotta V, Schedl P. 1999. The mcp element from the drosophila melanogaster bithorax complex mediates long-distance regulatory interactions. Genetics. 153(3):1333-1356.

      Postika N, Metzler M, Affolter M, Müller M, Schedl P, Georgiev P, Kyrchanova O. 2018. Boundaries mediate long-distance interactions between enhancers and promoters in the drosophila bithorax complex. PLoS Genet. 14(12):e1007702.

      Rollins RA, Morcillo P, Dorsett D. 1999. Nipped-b, a drosophila homologue of chromosomal adherins, participates in activation by remote enhancers in the cut and ultrabithorax genes. Genetics. 152(2):577-593.

      Samal B, Worcel A, Louis C, Schedl P. 1981. Chromatin structure of the histone genes of d. Melanogaster. Cell. 23(2):401-409.

      Shermoen AW, McCleland ML, O'Farrell PH. 2010. Developmental control of late replication and s phase length. Curr Biol. 20(23):2067-2077.

      Shidlovskii YV, Bylino OV, Shaposhnikov AV, Kachaev ZM, Lebedeva LA, Kolesnik VV, Amendola D, De Simone G, Formicola N, Schedl P et al. 2021. Subunits of the pbap chromatin remodeler are capable of mediating enhancer-driven transcription in drosophila. Int J Mol Sci. 22(6).

      Sigrist CJ, Pirrotta V. 1997. Chromatin insulator elements block the silencing of a target gene by the drosophila polycomb response element (pre) but allow trans interactions between pres on different chromosomes. Genetics. 147(1):209-221.

      Udvardy A, Schedl P. 1984. Chromatin organization of the 87a7 heat shock locus of drosophila melanogaster. J Mol Biol. 172(4):385-403.

      Vazquez J, Muller M, Pirrotta V, Sedat JW. 2006. The mcp element mediates stable long-range chromosome-chromosome interactions in drosophila. Molecular Biology of the Cell. 17(5):2158-2165.

      Wolle D, Cleard F, Aoki T, Deshpande G, Schedl P, Karch F. 2015. Functional requirements for fab-7 boundary activity in the bithorax complex. Mol Cell Biol. 35(21):3739-3752.

    1. Author Response:

      We thank the reviewers for careful reading, acknowledging the strength of our manuscript, and pointing out its weakness, which we will address in the revised version as described below.

      (1) We will supplement our analysis with finer statistical testing and analysis, such as cross-validation and a more detailed analysis of the relation between the inferred model and the intrinsic timescales of the system. For the effect of the drug TIMP-1 on the animal, we will first explore the possibility of assessing the results using a multifactor ANOVA test, with the caveat that the distribution of interactions is not Gaussian. We will further test the effect of different group size on the significance of our results by considering subgroups of animals in the drug group, and compare the statistics between the (subsampled) drug group and the controlled group.

      (2) Our manuscript is similar with that of Shemesh et al. in that we both analyze socially interacting mice by constructing maximum entropy models (MEM) of the co-localization patterns of mice. The difference is in the setup and the number of mice (4 mice in Shemesh et al, 10-15 in our work), as we outlined in the manuscript. To further supplement our current argument of the difference of our results in the Discussion section, we will learn a MEM model up to triplet interactions for our Eco-HAB mice data, and compare to our current MEM model up to pairwise interactions using test-set validation or the Bayesian information criterion (BIC).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The manuscript by Lu et al aims to study the effects of tubulin post-translational modification in C. elegans touch receptor neurons. Authors use gene editing to engineer various predicted PTM mutations in a-tubulin MEC-12 and b-tubulin MEC-7. Authors generate and analyze an impressive battery of mutants in predicted phosphorylation site and acetylation site of b-tubulin MEC-7, K40 acetylation site in a-tubulin MEC-12, enzymatic site of the a-tubulin acetyltransferase MEC-17, and PTM sites in the MEC-12 and MEC-7 C-tails (glutamylation, detyrosination, delta-tubulin). This represents a lot of work, and will appeal to a readership interested in C. elegans touch receptor neurons. The major concern/criticism of this manuscript is whether the introduced mutation(s) directly affects a specific PTM or whether the mutation affects gene expression, protein expression/stability/localization, etc. As such, this work does convincingly demonstrate, as stated in the title, that "Editing of endogenous tubulins reveals varying effects of tubulin posttranslational modifications on axonal growth and regeneration." 

      We thank the reviewer for the constructive comments. With regards to the major concern or criticism, we like to point out that we have previously characterized ~100 missense mutations in mec-7 and mec-12 (Zheng et al., 2017, PMID: 28835377; Lee et al., 2021, PMID: 33378215). So, we are familiar with the phenotypes associated with mutations that affect gene expression or protein stability, which mostly result in a null phenotype. When analyzing the PTM site mutants, we compared their phenotypes with the previously categorized phenotypes of null alleles, neomorphic mutations that increase microtubule stability, and antimorphic mutations that prevent polymerization or disrupt microtubule stability. For example, in the case of mec-7 S172 mutations, we found that S172P mutants had the same phenotype as the mec-7 knockout (mild neurite growth defects), suggesting that S172P likely affects protein folding or stability, resulting in the loss of MEC-7. In contrast, S172A and S172E mutations showed phenotypes similar to neomorphic alleles (the emergence of ectopic ALM posterior neurite) and antimorphic alleles (the severe shortening of all neurites in the TRNs), respectively. These phenotypic differences suggested to us that the effects of S172A and S172E mutations cannot be simply attributed to the loss of protein expression and stability. Similar logic was applied to the studies of other PTM-inactivating or -mimicking mutations.

      (2) For example, the authors manipulate the C-terminal tail of MEC-12 and MEC-7, to test the idea that polyglutamylation may be an important PTM. These mutants displayed subtle phenotypes. The authors show that branch point GT335 and polyglutamyation polyE recognizing antibodies stain cultured embryonic touch receptor neurons (TRNs), but did not examine staining in C. elegans TRNs in situ. To my knowledge, these antibodies have not been shown to stain the TRNs in any published papers, raising the question of how these "glutamylation" mutations are affecting mec-12 and -7. The rationale for using cultured embryonic TRNs and the relevance of the data and its interpretation are not clear. 

      The GT335 and polyE antibodies were used by previous studies (O’Hagan et al., 2011, PMID: 21982591; and O’Hagan et al., 2017, PMID: 29129530) to detect the polyglutamylation signals in the sensory cilia of C. elegans. We initially tried to stain the whole animals using these antibodies but could not get clear and distinct signals in the TRNs. We reason that the tubulin polyglutamylation signals in the TRNs may be weak, and the in situ staining method which requires the antibodies to penetrate multiple layers of tissues (e.g., cuticles and epidermis) to reach the TRN axons may be not sensitive enough to detect the signal. In fact, the TRN axons are located deeper in the worm body compared to the sensory cilia that are mostly exposed to the environment. Another reason could be that the tissues (mostly epidermis) surrounding the TRN axons also have polyglutamylation staining, which makes it difficult to recognize TRN axons. This is a situation different from the anti-K40 acetylation staining, which only occurs in the TRNs because MEC-12 is the only a-tubulin isotype that carries K40. Due to these technical difficulties, we decided to use the in vitro cultured TRNs for the staining experiment, which allows both easy access of the antibodies (thus higher sensitivity) and the dissociation of the TRNs from other tissues. The fact that we were able to observe reduced staining in the ttll mutants and the tubulin mutants that lost the glutamate residues suggest that these antibodies indeed detected glutamylation signals in the cells.

      (3) The final paragraph of the discussion is factually incorrect. The C. elegans homologs of the CCP carboxypeptidases are called CCPP-1 and CCPP-6. There are several publications on their functions in C. elegans.

      We thank the reviewer for pointing out the mistake in the text. We intended to say that “there is no C. elegans homolog of the known tubulin carboxypeptidases that catalyze detyrosination”, which is true given that the detyrosinase vasohibins (VASH1/VASH2) homologs cannot be found in C. elegans. We are aware of the publications on CCPP-1 and CCPP-6; CCPP-1 is known to regulate tubulin deglutamylation in the cilia of C. elegans (O’Hagan et al., 2011 and 2017), while CCPP-6 may function in the PLM to regulate axonal regeneration (Ghosh-Roy et al., 2012). In the revised manuscript, we have corrected the error.

      Reviewer #2 (Public Review):

      Summary:

      The tubulin subunits that make up microtubules can be posttranslationally modified and these PTMs are proposed to regulate microtubule dynamics and the proteins that can interact with microtubules in many contexts. However, most studies investigating the roles of tubulin PTMs have been conducted in vitro either with purified components or in cultured cells. Lu et al. use CRISPR/Cas9 genome editing to mutate tubulin genes in C. elegans, testing the role of specific tubulin residues on neuronal development. This study is a real tour de force, tackling multiple proposed tubulin modifications and following the resulting phenotypes with respect to neurite outgrowth in vivo. There is a ton of data that experts in the field will likely reference for years to come as this is one of the most comprehensive in vivo analyses of tubulin PTMs in vivo.

      This paper will be very important to the field, however would be strengthened if: 1) the authors demonstrated that the mutations they introduced had the intended consequences on microtubule PTMs, 2) the authors explored how the various tubulin mutations directly affect microtubules, and 3) the findings are made generally more accessible to non C. elegans neurobiologists.

      (1) The authors introduce several mutations to perturb tubulin PTMs, However, it is unclear to what extent the engineered mutations affect tubulin in the intended way i.e. are the authors sure that the PTMs they want to perturb are actually present in C. elegans. Many of the antibodies used did not appear to be specific and antibody staining was not always impacted in the mutant cases as expected. For example, is there any evidence that S172 is phosphorylated in C. elegans, e.g. from available phosphor-proteomic data? Given the significant amount of staining left in the S172A mutant, the antibody seems non-specific in this context and therefore not a reliable readout of whether MTs are actually phosphorylated at this residue. As another example, there is no evidence presented that K252 is acetylated in C. elegans. At the very least, the authors should consider demonstrating the conservation of these residues and the surrounding residues with other organisms where studies have demonstrated PTMs exist. 

      We thank the reviewer for the comments. To our knowledge, there are very few phosphor-proteome data available for C. elegans. We searched a previously published dataset (Zielinska et al., 2009; PMID: 19530675) and did not find the S172 phosphorylation signal in MEC-7. This is not surprising, given that only six touch receptor neurons expressed MEC-7 and the abundance of MEC-7 in the whole animal lysate may be below the detection limit. However, this phosphorylation site S172 is highly conserved across species and tubulin isotypes (Figure 1-figure supplement 1 in the revised manuscript), suggesting that this site is likely phosphorylated in MEC-7.

      In the case of K252, the potential acetylation site and the flanking sequences are extremely conserved across species and isotypes. In fact, the 20 amino acids from 241-260 a.a. are identical among the tubulin genes of C. elegans, fruit flies, Xenopus, and humans (Figure 4-figure supplement 1B). Thus, although K252 acetylation was found in the HeLa cells, this site can possibly be acetylated. 

      In the case of K40, we observed sequence divergence at the PTM site and adjacent sequences among the tubulin isotypes in C. elegans. MEC-12 is the only C. elegans a-tubulin isotype that has the K40 residue, and the 40-50 a.a. region of MEC-12 appears to be more conserved than other isotypes when compared to Drosophila, frog, and human a-tubulins (Figure 4-figure supplement 1A).

      (2) Given that the authors have the mutants in hand, it would be incredibly valuable to assess the impact of these mutations on microtubules directly in all cases. MT phenotypes are inferred from neurite outgrowth phenotypes in several cases, the authors should look directly at microtubules and/or microtubule dynamics via EBP-2 when possible OR show evidence that the only way to derive the neurite phenotypes shown is through the inferred microtubule phenotypes. For example, the effect of the acetylation or detyrosination mutants on MTs was not assessed. 

      We thank the reviewer for the suggestions. In this study, we created >20 tubulin mutants. Due to limited time and resources, we were not able to examine microtubule dynamics in every mutant strain using EBP-2 kymographs. We assessed the effects of the tubulin mutations mostly based on the changes on neurite growth pattern. From our previous experience of analyzing ~100 mec-7 and mec-12 missense mutations (Zheng et al., 2017, MBoC; Lee et al., 2021, MBoC), we found that the changes in microtubule dynamics are correlated with the changes in neuronal morphologies. For example, the growth of ectopic ALM-PN is correlated with fewer EBP-2 comets and potentially reduced microtubule dynamics; this correlation holds true for several mec-7 neomorphic missense alleles we examined before (Lee et al., 2021, MBoC) and the PTM site mutants [e.g., mec-7(S172A) and mec-12(4Es-A)] analyzed in this study. Similarly, the shortening of TRN neurites is correlated with more EBP-2 comets and increased microtubule dynamics. For the mutants that don’t show neurite growth defects, our previous experience is that they are not likely to show altered microtubule dynamics in EBP-2 tracking experiments. So, we did not analyze the acetylation mutants (which had no defects in neurite growth) and the detyrosination mutants (which had weak ALM-PN phenotype). Nevertheless, we agree with the reviewer that we could not rule out the possibility that there may be some slight changes to microtubule dynamics in these mutants.

      Using tannic acid staining and electron microscopy (EM), we previously examined the microtubule structure in several tubulin missense mutants (Zheng et al., 2017, MBoC) and found that the loss-of-function and antimorphic mutations significantly reduced the number of microtubules and altered microtubule organizations by reducing protofilament numbers. These structural changes are consistent with highly unstable microtubules and defects in neurite growth. On the other hand, neomorphic mutants had only slight decrease in microtubule abundance, maintained the 15-protofilament structure, and had a more tightly packed microtubule bundles that filled up most of the space in the TRN neurite (Zheng et al., 2017, MBoC). These structural features are consistent with increased microtubule stability and ectopic neurite growth. Although we did not directly examine the microtubule abundance and structure using EM in this study, we would expect similar changes that are correlated with the neurite growth phenotypes in the PTM mutants. We agree with the reviewer, it will be informative to conduct more comprehensive analysis on these mutants using EM and other structural biology methods.

      (3) There is a ton of data here that will be important for experts working in this field to dig into, however, for the more general cell biologist, some of the data are quite inaccessible. More cartoons and better labeling will be helpful as will consistent comparisons to control worms in each experiment.

      Response: We thank the reviewer for the comment. In the revised manuscript, we added some cartoons to Figure 2G to show the location of the synaptic vesicles. The neurite growth phenotype should be quite straightforward. Nevertheless, we added one more Figure (Figure 8) to summarize all the results in the study with cartoons that depicted the changes to neuronal morphologies.

      (4) In addition, I am left unconvinced of the negative data demonstrating that MBK does not phosphorylate tubulin. First, the data described in lines 207-211 does not appear to be presented anywhere. Second, RNAi is notoriously finicky in neurons, thus necessitating tissue-specific degradation using either the ZF/ZIF-1 or AID/TIR1 systems which both work extremely well in C. elegans. Third, there appears to be increasing S172 phosphorylation in Figure 3 Supplement 2 with added MBK-2, but there is no anti-tubulin blot to show equal loading, so this experiment is hard to interpret.

      We added the results of mbk-1, mbk-2, and hpk-1 mutants and cell-specific knockdown of MBK-2 into Figure 3-figure supplement 1D. Considering the reviewer’s suggestion, we attempted to use a ZIF-1 system to remove the MBK-2 proteins specifically in the TRNs using a previously published method (PMID: 28619826). We fused endogenous MBK-2 with GFP by gene editing and then expressed an anti-GFP nanobodies fused with ZIF-1 in the TRNs to induce the degradation of MBK-2::GFP. To our surprise, unlike the mbk-2p::GFP transcriptional reporter, the MBK-2::GFP did not show detectable expression in the TRNs, although expression can be seen in early embryos, which is consistent with the “embryonic lethal” phenotype of the mbk-2(-) mutants (Figure 3-figure supplement 2A-B in the revised manuscript). We reason that either endogenous MBK-2 is not expressed in the TRNs or is expressed at a very low level. We then crossed mbk-2::GFP with ItSi953 [mec-18p::vhhGFP4::Zif-1] to trigger the degradation of any potential MBK-2 proteins and did not observe the ectopic growth of ALM-PN (Figure 3- figure supplement 2C). These results suggest that MBK-2 is not likely to regulate tubulin phosphorylation in the TRNs, which is consistent with the results of other genetic mutants and the RNAi experiments.

      For Figure 3 Supplement 2 (Figure 3-figuer supplement 3 in revised manuscript), because we added the same amount of purified MEC-12/MEC-7 to all reactions and had established equal loading in Figure 3E, we did not do the anti-tubulin staining in this experiment. Since higher concentration (1742 nM) of MBK-2 did not produce stronger signal than the condition with 1268 nM, we don’t think the 1268 nM band represents true phosphorylation. Moreover, the signal is not significantly stronger than the control without MBK-2 and is much lower than the signal generated by CDK1 in Figure 3E. Based on these results, we concluded that MBK-2 is not likely to phosphorylate MEC-7.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      General:

      A summary table would help the reader digest the vast amount of phenotypic data.

      Cartoons to help a non-C. elegans reader understand the figures. 

      We added Figure 8 to summarize and illustrate the effects of the various mutants analyzed in this study.

      Specific:

      The authors engineered mutations into the predicted phosphorylation site of b-tubulin mec-7. These CRISPR-alleles mutations phenocopied previously identified loss-of-function, gain-of-function, and neomorphic mec-7 alleles identified in genetic screens by the Chalfie lab. Next, the authors sought to identify the responsible kinase, taking a candidate gene approach. The most likely family - minibrain - had no effect when knocked down/out. The authors showed that cdk-1 mutants displayed ectopic ALM-PN outgrowth. Whether cdk-1 specifically acts in the TRNs was not demonstrated, calling into question whether CDK-1 phosphorylates S172 in vivo. In their introduction (lines 45-59), the authors built a case for engineering PTM mutations directly into tubulins, because the PTM enzymes may have multiple substrates. This logic applies to the cdk-1 experiment and its interpretation. 

      The reviewer is right. Since CDK1 and minibrain kinase are the only known kinases that catalyze S172 phosphorylation, our results suggest that CDK-1 is more likely to catalyze S172 phosphorylation in the TRNs compared to MBK-1/2. Genetic studies found that cdk-1(-); mec-7(S172A) double mutants did not show stronger phenotype than the two single mutants, suggesting that they function in the same pathway. Nevertheless, we could not rule out the possibility that other kinases may also control S172 phosphorylation, and the effect of CDK-1 is indirect. We mentioned this possibility in the revised manuscript.

      For a-tubulin MEC-12, acetyl-mimicking K40Q and unmodifiable K40R mutants failed to stain with the anti-acetyl-a-tubulin (K40) antibody and displayed subtle TRN phenotypes. The enzymatically dead MEC-17 had phenotypes similar to those described by Topalidou (2012), confirming the Chalfie lab finding that MEC-17 has functions in addition and independent of its acetyltransferase activity. The authors moved onto a predicted acetylation site in MEC-7 and observed TRN developmental defects, and acknowledged that this may be due to tubulin instability and not a PTM. This is a concern for all mutants, as there is no way to measure whether the protein is expressed, stable, or localized properly. 

      We acknowledge that this is a caveat of mutational studies. An amino acid substitution at the PTM site may have multiple effects, including the change of the PTM state and potential alteration of protein conformation. Without direct evidence for enzymatic modification of the PTM site in the neurons, we could not rule out the possibility the phenotype we observed is not related to PTM and instead is the result of abnormal protein conformation and function caused by the mutation.

      Nevertheless, as stated in our above response to the first point in the public review, we can phenotypically differentiate loss-of-function and gain-of-function mutants. If the mutation reduces expression or general protein stability, it is more likely to cause a loss-of-function phenotype. For most PTM site mutants, this is not the case. We observed mostly gain-of-function phenotype, suggesting that the missense mutations did not simply inactivate the tubulin protein and instead affected the functional properties of the protein.

      From here, the authors manipulate the C-terminal tail of MEC-12 and MEC-7, testing the idea that polyglutamylation may be an important PTM. These mutants displayed subtle phenotypes. The authors show that branch point GT335 and polyglutamyation polyE recognizing antibodies stain cultured embryonic TRNs, but did not examine staining in TRNs. To my knowledge, these antibodies have not been shown to stain the TRNs in any published papers (see next point). The rationale for using cultured embryonic TRNs is not clear. 

      See our response to the second point in the public review.

      Lines 548-553 There are several publications on CCPP-1 and CCPP-6 functions in TRNs and ciliated sensory neurons. See

      PMID: 20519502

      PMID: 21982591

      PMID: 21943602

      PMID: 23000142

      PMID: 29129530

      PMID: 33064774

      PMID: 36285326

      PMID: 37287505 

      We thank the reviewer for pointing out these references, some of which were cited in the revised manuscript. We made a mistake in the Discussion by saying that there are no C. elegans homologs of tubulin carboxypeptidases while we intended to state that there is no homolog of tubulin detyrosinase in C. elegans. We are aware of the studies of CCPP-1 and CCPP-6 and have corrected the mistake in revised manuscript (also see our response to the third point in the public review).

      Reviewer #2 (Recommendations For The Authors):

      Figures: 

      As stated in the public review, more cartoons and better labeling will be helpful as will consistent comparisons to control worms in each experiment. A good example of this issue is demonstrated in Figure 2 and Figure 4: 

      (1) Figure 2: Please label images with what is being probed in each panel. 

      We added labels to the panels.

      (2) Figure 2G is very hard to interpret - cartoon diagramming what is being observed would be helpful. 

      We added cartoons to help illustrate the images.

      (3) Line 182-185: is this referring to your data or to Wu et al? It is not clear in this paragraph when the authors are describing published work versus their own data presented here. 

      It is from our data. We have made it clear in the revised manuscript.

      (4) Figure 2 - 2K is not well described. What experiment is being done here? What is dlk-1 and why did you look at this mutant? 

      Figure 2K showed that both wild-type animals and S172A mutants could reconnect the severed axons after laser axotomy. Previous studies have found that dlk-1(-) mutants were not able to regenerate axons due to altered microtubule dynamics (PMID: 19737525; PMID: 23000142). We used dlk-1(-) mutants as a negative control, because DLK-1 promotes microtubule growth following axotomy, and the DLK-1 pathway is essential for regeneration (PMID: 23000142). We want to highlight the phenotypic difference between dlk-1(-) mutants and the S172E mutants. Although both mutants showed similar regrowth length, dlk-1(-) mutants showed unbranched regrowth probably due to the lack of microtubule polymerization, whereas the S172E mutants showed a mesh-like regrowth pattern likely due to highly dynamic and unstable microtubules. We explained the different phenotypes in the revised manuscript.

      (5) Figure 4C: this phenotype is hard to interpret. Where is the wt control? Where is the quantification? 

      In the Figure legend, we have referred the readers to Figure 1G for the wild-type image. Quantification is provided in the text (~20% of the animals showed the branching defects).

      (6) There are no WT comparison images in Figure 4I, making the quantification difficult to interpret 

      In the Figure legend, we have referred the readers to Figure 1A for the wild-type control. Moreover, we included a new Figure 8 to summarize the phenotypes of all mutants.

      Experimental:

      (1) Is it clear that only MEC-7/MEC-12 are the only a- and b-tubulin present in the TRNs? The presence of other tubulins not mutated would complicate the interpretation of the results. 

      According to the mRNA levels, the expression of MEC-7 and MEC-12 are >100 fold higher than other tubulin isotypes. For example, single-cell transcriptomic data (Taylor et al., 2021) showed that mec-7 mRNA is at 135,940 TPM in ALM neurons, whereas two other tubulin isotypes, tbb-1 and tbb-2, have expression value of 54 and 554 TPM, respectively in the ALM. So, even if there are some other tubulin isotypes, their abundance is much lower than mec-7 and mec-12 and are not likely to interfere with the effects of the mec-7 and mec-12 mutants.

      (2) The in vitro kinase assays should be quantified. 

      We have added the quantification.

      (3) The idea that Cdk1 phosphorylates tubulin in interphase is surprising and I am left wondering how the authors propose that Cdk1 is activated in interphase. Is cyclin B (or another cyclin) present in interphase in this cell type? Expression but not activation of Cdk1 is not discussed. 

      CDK1 can work with cyclin A and cyclin B. C. elegans has one cyclin A gene (cya-1) and four cyclin B genes (cyb-1, cyb-2.1, cyb-2.2, and cyb-3). According to single-cell transcriptomic data of L4 animals, cya-1 and cyb-1 showed weak expression in many postmitotic neurons (including the ALM neurons), while cyb-2.1, cyb-2.2, and cyb-3 had no expression in neurons. So, it is possible that cya-1/cyclin A and cyb-1/cyclin B has low level of expression in the TRNs. A previous study also found the expression of cell cycle regulators (including cyclins) in postmitotic neurons in mouse brain (Akagawa et al., 2021; PMID: 34746147).

      (4) What is the significance of neurite swelling and looping in Figure 4H? The underlying cause of this phenotype is not described. 

      The neurite swelling and looping phenotype of mec-17(-) mutants were described by Topalidou et al., (2012; PMID: 22658602) and were caused by the bending of the microtubules. It appears that the loss of the a-tubulin acetyltransferase altered the organization of microtubules in the TRNs. These defects were partially rescued by the enzymatically dead MEC-17, suggesting that MEC-17 may play a non-enzymatic (and likely structural) role in regulating microtubule organization. We added more explanation in the revised manuscript.

      (5) It is quite surprising that polyglutamylation is not affected in the quintuple ttll mutant. Since the authors made the sextuple ttll mutant, could they demonstrate whether polyglutamylation is further reduced in this mutant via GT335 staining? 

      We did not make the comparison of the quintuple and sextuple ttll mutants because they were crossed with TRN markers with different colors for technical reasons. The quintuple mutants CGZ1475 carried uIs115 [mec-17p::TagRFP] IV, whereas the sextuple mutants CGZ1474 carried zdIs5 [mec-4p::GFP] I. As a result, we need to use different secondary antibodies for the antibody staining, which makes the results not compatible.

      Polyglutmaylation signal in the cell body was strongly affected by the ttll mutations. In fact, in the ttll-4(-); ttl-5(-); ttll-12(-) triple mutants, the signal is significantly reduced in the cell body of the TRNs, as well as the cell body of other cells. What’s surprising is that the signal in the axons persisted in the ttll triple and quintuple mutants. As the reviewers suggested, we also stained the sextuple mutants and found similar pattern as the triple and quintuple mutants (new Figure 6-figure supplement 1C in the revised manuscript), although the results are not quantitatively comparable due to the use of secondary antibodies with different fluorophores.

      Writing:

      (1) The beginning of the results section is quite jarring. The information in lines 96-104 should be in the Introduction. 

      Due to the nature of this paper, each section deals with a particular PTM. We think it is helpful to discuss some background information before describing our results on each PTM rather than giving all in the introduction. Nevertheless, we modified the beginning of the results to make it more coherent and more connected with the preceding paragraphs.

      (2) Line 122-126: conclusions are not supported by the data: it is suggested from previous experiments, but authors do not look at MTs directly. 

      We have rephrased the statement to acknowledge that we made such conclusion based on phenotypic similarity with mutants we previously examined.

      (3) I am confused by the usage of both mec-12(4EtoA) and mec-12(4Es-A). Are these the same mutations? If so, there needs to be consistency. If not, each case needs to be defined. 

      They are the same. We have corrected the mistake and are now using mec-12(4Es-A) to refer to the mutants.

      Line 105: phosphor --> phospho 

      Line 187: were --> was 

      Line 298: is --> are

      The above typos are corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Recommendations For The Authors):

      I still find it really impressive that the Purkinje cell stimulation so closely mimics the pathogenic phenotypes - in my opinion, the strongest part of the paper. I would like just a little clarification on some of my previous questions.

      Major points:

      (1) Can the authors clarify where the new units came from? Are these units that were recorded before the initial submission and excluded, but are now included? If so, why were they excluded before? Or are these units that were recorded since the original submission?

      The number of units increased in Figure 1 for three reasons: 1) We have now plotted the classifier results in Figure 1 instead of the validation results, which have been moved to Figure 1 Supplement 3. 2) In response to reviewer comments, we no longer include units that had >60 s of recording in both our model creation and validation. We had previously used 30 s for creating the model and a different 30 s for validating the model, if an additional 30 s were available. 3) We changed our model creation and validation strategy based on previous reviewer comments. The new units in Figures 2-4 were taken from our pool of previously collected but unanalyzed data (we collect neural data on a rolling basis and thus these data were not initially available). We were fortunate to have these data to analyze in order to address the concerns about the number of cells included in the manuscript. The number of units increased in Figure 5 because new units were recorded in response to reviewer comments.

      (2) Why did some of the neuron counts go down? For example, in Pdx1Cre;Vglut2fl/fl mice, the fraction of units with the control signature went from 11/21 to 7/23. Is this because the classifier changed between the original submission and the revision?

      Yes, the proportion of cells matching each classification changed due to the different parameters and thresholds used in the updated classifier model.

      Minor points:

      In the Discussion: "We find some overlap and shared spike features between the different disease phenotypes and show that healthy cerebellar neurons can adapt multiple disease-associated spike train signatures." I think "adapt" should be "adopt"

      In the Discussion: "compare" is misspelled as "compared"

      Thank you for bringing these typos to our attention. We will upload a new version of the text with the typos corrected.


      The following is the authors’ response to the original reviews.

      We would like to thank the Reviewers for providing excellent and constructive suggestions that have enabled us to strengthen our overall presentation of our data. We have addressed each of the comments by altering the text, providing additional data, and revising the figures, as requested.

      Below are our explanations for how we have altered the manuscript in this revised version.

      Recommendations for the authors:

      I think you will have seen from the comments that there was great enthusiasm for the importance of this study. There were also shared concerns about how the classifier may be inadequate in its current format, as well as specific suggestions to consider to improve. I hope that you will consider a revision to really amplify the impact of the importance of this study.

      Reviewer #1 (Recommendations For The Authors):

      Distinct motor phenotypes are reflected in different neuronal firing patterns at different loci in motor circuits. However, it is difficult to determine if these altered firing patterns: 1) reflect the underlying neuropathology or phenotype, 2) whether these changes are intrinsic to the local cell population or caused by larger network changes, and 3) whether abnormal firing patterns cause or reflect abnormal movement patterns. This manuscript attempts to address these questions by recording neural firing patterns in deep cerebellar nucleus neurons in several models of cerebellar dysfunction with distinct phenotypes. They develop a classifier based on parameters of single unit spike trains that seems to do an inconsistent job of predicting phenotype (though it does fairly well for tremor). The major limitation of the recording/classifier experiments is the low number of single units recorded in each model, greatly limiting statistical power. However, the authors go on to show that specific patterns of Purkinje cell stimulation cause consistent changes in interposed nucleus activity that map remarkably well onto behavioral phenotypes. Overall, I did not find the recording/classifier results to be very convincing, while the stimulation results strongly indicate that interposed nucleus firing patterns are sufficient to drive distinct behavioral phenotypes.

      We thank the reviewer for their comments. We describe below how we have addressed the major concerns.

      Major concerns:

      (1) I don't think it's legitimate to use two 30-second samples from the same recording to train and validate the classifier. I would expect recordings from the same mouse, let alone the same unit, to be highly correlated with each other and therefore overestimate the accuracy of the classifier. How many of the recordings in the training and validation sets were the same unit recorded at two different times?

      We previously published a paper wherein we measured the correlation (or variability) between units recorded from the same mouse versus units recorded from different mice (see: Van der Heijden et al., 2022 – iScience, PMID: 36388953). In this paper we did not find that nuclei neuron recordings from the same mouse were more correlated or similar to each other than recordings from different mice. 

      Upon this reviewer comment, however, we did observe strong correlations between the two 30-second samples from the same recording units. We therefore decided to no longer validate our classifier based on a training and validation sets that had overlapping units. Instead, we generated 12 training sets and 12 non-overlapping validation sets based on our entire database. We then trained 12 classifier models and ranked these based on their classification ability on the validation sets (Figure 1 – supplemental Figure 3). We found that the top two performing classifier models were the same, and used this model for the remainder of the paper. 

      (2) The n's are not convincing for the spike signature analyses in different phenotypic models. For example, the claim is that Pdx1Cre;Vglut2fl/fl mice have more "control" neurons than ouabain infusion mice (more severe phenotype). However, the numbers are 11/21 and 7/20, respectively. The next claim is that 9/21 dystonic neurons are less than 11/20 dystonic neurons. A z-test for proportions gives a p-value of 0.26 for the first comparison and a pvalue of 0.44 for the second. I do not think any conclusions can be drawn based on these data.

      We included more cells in our analyses and found that the z-test for n the proportion of cells with the “control” and “dystonia” signature is indeed statistically significant. 

      (3) Since the spiking pattern does not appear to predict an ataxic phenotype and the n's are too small to draw a conclusion for the dystonic mice, I think the title is very misleading - it does not appear to be true that "Neural spiking patterns predict behavioral phenotypes...", at least in these models.

      We have changed the title to: “Cerebellar nuclei cells produce distinct pathogenic spike signatures in mouse models of ataxia, dystonia, and tremor.” We feel that this new title captures the idea that we find differences between spike signatures associated with ataxia, dystonia, and tremor and that these signatures induce pathological movements.

      (4) I don't think it can be concluded from the optogenetic experiments that the spike train signatures do not depend on "developmental changes, ...the effect of transgene expression, ... or drug effects outside the cerebellum." The optogenetic experiments demonstrate that modulating Purkinje cell activity is sufficient to cause changes in DCN firing patterns and phenotypes (i.e., proof-of-principle). However, they do not prove that this is why DCN firing is abnormal in each model individually.

      Thank you for highlighting this section of the text. We agree that the optogenetic experiments cannot explain why the DCN is firing abnormally in each model. We have edited this section of the text to prevent this conclusion from being drawn by the readers.

      Minor points:

      (1) It would be nice to see neural recordings in the interposed nucleus during Purkinje terminal stimulation to verify that the firing patterns observed during direct Purkinje neuron illumination are reproduced with terminal activation. This should be the case, but I'm not 100% certain it is.

      We have edited the text to clarify that representative traces and analysis of interposed nucleus neurons in response to Purkinje terminal stimulation are the data in Figure 5.

      (2) How does the classifier validation (Fig. 1E) compare to chance? If I understand correctly, 24/30 neurons recorded in control mice are predicted to have come from control mice (for example). This seems fairly high, but it is hard to know how impressive this is. One approach would be to repeat the analysis many (1000s) of times with each recording randomly assigned to one of the four groups and see what the distribution of "correct" predictions is for each category, which can be compared against the actual outcome.

      We have now also included the proportion of spike signatures in the entire population of neurons and show that the spike signatures are enriched in each of the four groups (control, ataxia, dystonia, tremor) relative to the presence of these signatures in the population (Figure 1E). 

      (3) I don't think this is absolutely necessary, but do the authors have ideas about how their identified firing patterns might lead to each of these phenotypes? Are there testable hypotheses for how different phenotypes caused by their stimulation paradigms arise at a network level?

      We have added some ideas about how these spike signatures might lead to their associated phenotypes to the discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) As mentioned earlier, my main concern pertains to the overall architecture and training of the classifier. Based on my reading of the methods and the documentation for the classifier model, I believe that the classifier boundaries may be biased by the unequal distribution of neurons across cerebellar disease groups (e.g., n=29 neurons in control versus n=19 in ataxics). As the classifier is trained to minimize the classification error across the entire sample, the actual thresholds on the parameters of interest may be influenced by the overrepresentation of neurons from control mice. To address this issue, one possible solution would be to reweight each group so that the overall weight across classes is equal. However, I suggest a better strategy might be to revise the classifier architecture altogether (as detailed below).

      We have retrained the classifier model based on equal numbers of ataxic, dystonic, and tremor cells (n=20) but we intentionally included more control cells (n=25). We included more control cells because we assume this is the baseline status for all cerebellar neurons and wanted to avoid assigning disease signatures to healthy neurons too easily. 

      (2) As the authors make abundantly clear, one mouse model of disease could potentially exhibit multiple phenotypes (e.g., a mouse with both ataxia and tremor). To address this complexity, it might be more valuable to predict the probability of a certain CN recording producing specific behavioral phenotypes. In this revised approach, the output of the classifier wouldn't be a single classification (e.g., "this is an ataxic mouse") but rather the probability of a certain neural recording corresponding to ataxia-like symptoms (e.g., "the classifier suggests that this mouse has a 76% likelihood of exhibiting ataxic symptoms given this CN recording"). This modification wouldn't require additional data collection, and the exemplar disease models could still be used to train such a revised network/classifier, with each mouse model corresponding to 0% probability of observing all other behavioral phenotypes except for the specific output corresponding to the disease state (e.g., L7CreVgat-fl/fl would be 0% for all categories except ataxia, which would be trained to produce a score of 100%). This approach could enhance the validation results across other mouse models by allowing flexibility in a particular spike train parameter to produce a diverse set of phenotypes.

      This is a great comment. Unfortunately, our current dataset is constrained to fully address this comment for the following reasons:

      - We have a limited number of neurons on which we can train our classifier neurons. Further dividing up the groups of neurons or complicating the model limited the power of our analyses and resulted in overfitting of the model on too few neurons.

      - The recording durations (30 seconds) used to train our model are likely too short to find multiple disease signatures within a single recording. We feel that the complex phenotypes are likely resulting from cells within one mouse exhibiting a mix of disease signatures (as in the Car8wdl/wdl mice).

      We think this question would be great for a follow-up study that uses a large number of recordings from single mice to fully predict the mouse phenotype based on the population spike signatures. 

      To limit confusion about our classifier model, we have also altered the language of our manuscript and refer to the cells exhibiting a spike signature instead of predicting a phenotype. 

      However, the paper falls short in terms of the classifier model itself. The current implementation of this classifier appears to be rather weak. For instance, the crossvalidated performance on the same disease line mouse model for tremor is only 56%. While I understand that the classifier aims to simplify a high-dimensional dataset into a more manageable decision tree, its rather poor performance undermines the authors' main objectives. In a similar vein, although focusing on three primary features of spiking statistics identified by the decision tree model (CV, CV2, and median ISI) is useful for understanding the primary differences between the firing statistics of different mouse models, it results in an overly simplistic view of this complex data. The classifier and its reliance on the reduced feature set are the weakest points of the paper and could benefit from further analysis and a different classification architecture. Nevertheless, it is commendable that the authors have collected high-quality data to validate their classifier. Particularly impressive is their inclusion of data from multiple mouse models of ataxia, dystonia, and tremor, enabling a true test of the classifier's generalizability.

      We intentionally simplified our parameter space from a high-dimensional dataset into a more manageable decision tree. We did this for the following reasons:

      - The parameters, even though all measuring different features, are highly correlated (see Figure 1 – supplemental Figure 2). Further, we were training our dataset on a limited number of recordings. We found that including all parameters (for example using a linear model) caused overfitting of the data and poor model performance.

      - Describing the spike signatures using a lower number of parameters allowed us to design optogenetic parameters that would mimic this parameter space. This would be infinitely more complex with a bigger parameter space. 

      We agree with the reviewer that inclusion of multiple mouse models in addition to the optogenetics experiments provide the classifier’s generalizability. 

      Minor Comments:

      (1) The blown-up CN voltage traces in Figures 5C and Supplementary Figure 2B appear more like bar plots than voltage traces on my machine.

      Thank you for bringing this to our attention. We have improved the rendering of the traces.

      (2) The logic in lines 224-228 is somewhat confusing. The spike train signatures are undoubtedly affected by all the factors mentioned by the authors. What, I believe, the authors intend to convey is that because changes in CN firing rates can be driven by multiple factors, it is the CN firing properties themselves that likely drive disease-specific phenotypes.

      We agree that our discussion of the CN firing needs clarification. We have made the appropriate edits in the text.

      Reviewer #3 (Recommendations For The Authors):

      It's quite astounding that this can be done from single spike trains from what are almost certainly mixed populations of neurons. Could you add something to the discussion about this? Some questions that could be addressed would be would multiple simultaneous recordings additionally help classify these diseases, or would non-simultaneous recordings from the same animal be useful? Also more discussion about which cells you are likely recording from would be useful.

      Thank you for this suggestion. We have added discussion about multiple recordings, simultaneous vs non-simultaneous recordings, and our thoughts on the cell population recorded in this work.

      Data in figure 2 is difficult to understand - it appears that the majority of dysregulated cells in 2 ataxic models are classified as dystonia cells, not ataxic cells. This appears surprising as it seems to be at odds with earlier data from Fig 1. In my opinion, it is not discussed adequately in the Results or Discussion section.

      We have added further discussion of the ataxia models represented in Figures 1 and 2.

      Minor comment:

      The colours of the subdivisions of the bars in 2C and 3C, and the rest of the paper appear to be related to the groups in the middle (under "predicted"), but the colours are much paler in the figure than in the legend, although the colours in the bars and the legends match in the first figure (1E). Does this signify something?

      These figures were remade with the same colors across the board.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study by Prieto et al. faces the increasingly serious problem of bacterial resistance to antimicrobial agents. This work has an important element of novelty proposing a new approach to control antibiotic resistance spread by plasmids. Instead of targeting the resistance determinant, plasmid-borne proteins are used as antigens to be bound by specific nanobodies (Nbs). Once bound plasmid transfer was inhibited and Salmonella infection blocked. This in-depth study is quite detailed and complex, with many experiments (9 figures with multiple panels), rigorously carried out. Results fully support the authors' conclusions. Specifically, the authors investigated the role of two large molecular weight proteins (RSP and RSP2) encoded by the IncHI1 derivative-plasmid R27 of Salmonella. These proteins have bacterial Ig-like (Big) domains and are expressed on the cell surface, creating the opportunity for them to serve as immunostimulatory antigens. Using a mouse infection model, the authors showed that RSP proteins can properly function as antigens, in Salmonella strains harboring the IncHI1 plasmid. The authors clearly showed increased levels of specific IgG and IgA antibodies against these RSP proteins proteins in different tissues of immunized animals. In addition, non-immunized mice exhibited Salmonella colonization in the spleen and much more severe disease than immunized ones. 

      However, the strength of this work is the selection and production of nanobodies (Nbs) that specifically interact with the extracellular domain of RSP proteins. The procedure to obtain Nbs is lengthy and complicated and includes the immunization of dromedaries with purified RPS and the construction of a VHH (H-chain antibody variable region) library in E. coli. As RSP is expressed on the surface of E. coli, specific Nbs were able to agglutinate Salmonella strains harboring the p27 plasmid encoding the RSP proteins. 

      The authors demonstrated that Nbs-RSP reduced the conjugation frequency of p27 thus limiting the diffusion of the amp resistance harbored by the plasmid. This represents an innovative and promising strategy to fight antibiotic resistance, as it is not blocked by the mechanism that determines, in the specific case, the amp resistance of p27 but it targets an antigen associated with HincHI- derivative plasmids. Thus, RPS vaccination could be effective not only against Salmonella but also against other enteric bacteria. A possible criticism could be that Nbs against RSP proteins reduce the severity of the disease but do not completely prevent the infection by Salmonella.

      It is true that vaccina2on of mice with purified RSP protein did not provide complete protec2on against infec2on with a Salmonella strain harboring an IncHI plasmid. As this finding is based on an animal model, further inves2ga2on is required to evaluate its clinical efficacy. In any case, even par2al protec2on provided by nanobodies or by a vaccine could poten2ally improve survival rates among cri2cally ill pa2ents infected with a pathogenic bacterium harboring an IncHI plasmid. An addi2onal beneficial aspect of our approach is that it will reduce dissemina2on of IncHI plasmids among pathogenic bacteria, which would reduce the presence of an2bio2c resistance plasmids in the environment and in the bacteria infec2ng pa2ents. 

      Reviewer #2 (Public Review):

      Summary:

      This manuscript aims to tackle the antimicrobial resistance through the development of vaccines. Specifically, the authors test the potential of the RSP protein as a vaccine candidate. The RSP protein contains bacterial Ig-like domains that are typically carried in IncHl1 plasmids like R27. The extracellular location of the RSP protein and its role in the conjugation process makes it a good candidate for a vaccine. The authors then use Salmonella carrying an IncHl plasmid to test the efficacy of the RSP protein as a vaccine antigen in providing protection against infection of antibioticresistant bacteria carrying the IncHl plasmid. The authors found no differences in total IgG or IgA levels, nor in pro-inflammatory cytokines between immunized and non-immunized mice. They however found differences in specific IgG and IgA, attenuated disease symptoms, and restricted systemic infection.

      The manuscript also evaluates the potential use of nanobodies specifically targeting the RSP protein by expressing it in E. coli and evaluating their interference in the conjugation of IncHl plasmids. The authors found that E. coli strains expressing RSPspecific nanobodies bind to Salmonella cells carrying the R27 plasmid thereby reducing the conjugation efficacy of Salmonella. 

      Strengths:

      The main strength of this manuscript is that it targets the mechanism of transmission of resistance genes carried by any bacterial species, thus making it broad.

      The experimental setup is sound and with proper replication.

      Weaknesses:

      The two main experiments, evaluating the potential of the RSP protein and the effects of nanobodies on conjugation, seem as parts of two different and unrelated strategies.

      In preparing our manuscript, we were aware that we included two different strategies to combat an2microbial resistance. However, we deemed it valuable to include both in the paper. The development of new vaccines and the inhibi2on of the transfer of an2bio2c resistance determinants are currently considered relevant approaches to combat an2microbial resistance. Our inten2on in the ar2cle is to integrate these two strategies. 

      The survival rates shown in Figure 1A and Figure 3A for Salmonella pHCM1 and non-immunized mice challenged with Salmonella, respectively, are substantially different. In the same figures, the challenge of immunized mice and Salmonella pHCM1 and mice challenged with Salmonella pHCM1 with and without ampicillin are virtually the same. While this is not the only measure of the effect of immunization, the inconsistencies in the resulting survival curves should be addressed by the authors more thoroughly as they can confound the effects found in other parameters, including total and specific IgG and IgA, and pro-inflammatory cytokines.

      Overall the results are inconsistent and provide only partial evidence of the effectiveness of the RSP protein as a vaccine target.

      To address the concerns regarding the disparities in survival rates depicted in Figures 1A and 3A, it is important to refer to several factors that contribute to these variations. Firstly, it should be noted that the data depicted in these figures stem from distinct experimental sets conducted at different times employing different batches of mice. Despite the use of the same strain and supplier, individual animals and their batches can exhibit variability in susceptibility to infection due to inherent biological differences.

      Unlike in vitro cell culture experiments, which can achieve high replicability due to the homogeneity of cell lines, in vivo animal studies often exhibit greater variability. This variability is influenced not only by genetic variations within animal populations, even if originating from the same supplier, but also by environmental factors within the animal facility. These factors include temperature variations, the concentration y of non-pathogenic microorganisms in the facility, which can modify the immune responses, or the density of animals in the environment, consequently affecting human traffic and generating potential disturbances. 

      When designing experiments with animals, it is desirable for the results to be consistent across different animal batches. If one bacterial strain exhibits higher mortality rates than another across multiple experimental series, this pattern should be reproducible despite the inherent variability in in vivo studies. It is more important to demonstrate consistency in trends than to focus on absolute figures when validating experimental results. 

      It is also important to clarify that when we refer to survival rates, it doesn’ t necessarily mean that the animals were found deceased. The animal procedures were approved by the Ethics Committee of Animal Experimentation of the Universitat de Barcelona, which include an animal monitoring protocol. Our protocol requires close daily monitoring of several health and behavioral parameters, each evaluated according to specific criteria. When an animal reaches a predetermined score threshold indicating severe distress or suffering, euthanasia is administered to alleviate further suffering. At this point, biological samples are collected for subsequent analysis.

      The conjugative experiments use very long conjugation times, making it harder to assess if the resulting transconjugants are the direct result of conjugation or just the growth of transconjugants obtained at earlier points in time. While this could be assessed from the obtained results, it is not a direct or precise measure.

      In the conjuga2on experiments we u2lized a reduced number of donor cells expressing the RSP protein and of recipient cells, as well as long conjuga2on 2mes, to reflect more accurately a situa2on that may occur naturally in the environment. Short conjuga2on 2mes are efficient in controlled laboratory condi2ons using high densi2es of donor and recipient cells, but these condi2ons are not commonly found in the environment. For the interference of the conjuga2ve transfer of the IncHI plasmid we used an E. coli strain displaying the nanobody binding RSP to simulate a process that could be also scaled-up in a natural environment (i.e., a probio2c strain in a livestock farm) and that could be cost effec2ve. See discussion sec2on, lanes 326-328.   

      While the potential outcomes of these experiments could be applied to any bacterial species carrying this type of plasmids, it is unclear why the authors use Salmonella strains to evaluate it. The introduction does a great job of explaining the importance of these plasmids but falls short in introducing their relevance in Salmonella.

      The prevalence of IncHI plasmids in Salmonella was indicated in the introduc2on sec2on, lanes 65-67. Nevertheless, we understand the reviewer’s cri2cisms and have modified both these sentences in the introduc2on sec2on and also added comments in the results sec2on (lanes 118-128).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I understand working with mice can be challenging in terms of repeating experiments to further support the study's claims. For this reason, I think the authors need to discuss more thoroughly the following things:

      Can the authors comment on why the presence of Ampicillin leads to a lower upregulation of proinflammatory cytokines in the spleen despite harboring resistance against ampicillin?

      At the intestinal level, physiological inflammatory responses play a crucial role in enabling the host to identify foreign and commensal bacterial antigens and initiate a highly regulated and "controlled" immune response (Fiocchi, 2008. Inflamm Bowel Dis. 2008, 14 Suppl 2:S77-8). The administration of antibiotics such as ampicillin, reduces the load of intestinal resident microbiota, thereby lowering the extent of intestinal immune activation. This decline in immune activation extends to systemic levels, potentially accounting for the reduced expression of proinflammatory cytokines observed in the spleen.

      There are inconsistent results in the survival rates in Figures 1A and 3A, please discuss how this could alter the observed differences in total and specific IgG and IgA, and pro-inflammatory cytokines.

      To address the reviewer concerns regarding the discrepancies in survival rates shown in Figures 1A and 3A, and how these differences might influence the observed variations in total and specific IgG and IgA, as well as pro-inflammatory cytokines, it is important to clarify the terminology used in our study. In our context, "survival" does not solely refer to mortality per se, but encompasses the endpoints defined by our animal welfare protocols, which are rigorously supervised by the Animal Experimentation Ethics Committee of the University of Barcelona. Our protocol mandates close daily monitoring of several health and behavioral parameters, each scored according to specific criteria. When an animal reaches a predefined score threshold indicating severe distress or suffering, euthanasia is conducted to prevent further distress, at which point we collect biological samples for analysis.

      In contrast to in vitro cell culture experiments, which often achieve high replicability thanks to the homogeneity of cell lines, in vivo animal studies frequently display greater variability. This variability stems not only from genetic differences within animal populations, even if originating from the same supplier, but also from environmental factors within the animal facility. These factors encompass variations in temperature, the presence of non-pathogenic microorganisms in the facility (capable of altering immune responses) and the density of animals, which can impact human traffic and potentially lead to disturbances. 

      The experiments depicted in Figs. 1A and 3A were separated in time, and hence may be influenced by environmental factors within the animal facility. Nevertheless, in the comparative analysis performed between immunized and non-immunized animals, experiments were performed simultaneously and hence under similar environmental conditions in the animal facility. For several parameters (i.e., immunoglobulins and proinflammatory cytokines) statistically significant differences were observed. 

      Regarding the conjugation assays, it is not entirely clear to me why the conjugation times are so long. It would be beneficial to have more data about the conjugation efficacy between the donor and recipient without any E. coli expressing the nanobodies at different time intervals. This would help to differentiate between transconjugants and transconjugants obtained from early conjugation events.

      This comment is par2ally answered in a previous response, regarding the numbers of donor and recipient cells and dura2on of conjuga2on.  We note here that in fig. 9, the requested experiment with donor and recipient cells without E. coli interferent cells is already present, corresponding to the label “none”. To avoid confusion, we have modified the legend in fig. 9.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Although the study by Xiaolin Yu et al is largely limited to in vitro data, the results of this study convincingly improve our current understanding of leukocyte migration.

      (1) The conclusions of the paper are mostly supported by the data and in the revised manuscript clarification is provided concerning the exact CCL5 forms (without or with a fluorescent label or His-tag) and amounts/concentrations that were used in the individual experiments. This is important since it is known that modification of CCL5 at the N-terminus affects the interactions of CCL5 with the GPCRs CCR1, CCR3 and CCR5 and random labeling using monosuccinimidyl esters (as done by the authors with Cy-3) is targeting lysines. The revised manuscript more clearly indicates for each individual experiment which form is used. However, a discussion on the potential effects of the modifications on CCL5 in the results and discussion sections is still missing.

      Many thanks for the reviewer's suggestion. We fully agree it is important to clarify the potential issue of Cy3 labeling, and believe it is more suitable in the Materials and Methods section (line 312-314).

      (2) In general, authors used high concentrations of CCL5 in their experiments. In their reply to the comments they indicate that at lower CCL5 concentrations no LLPS is detected. This is important information since it may indicate the need for chemokine oligomerization for LLPS. This info should be added to the manuscript and comparison with for instance the obligate monomer CCL7 and another chemokine such as CXCL4 that easily forms oligomers may clarify whether LLPS is controlled by oligomerization.

      We are pleased by the help of the reviewers and accordingly inserted a brief discussion as suggested (line 240-246).

      (3) Statistical analyses have been improved in the revised manuscript.

      Thanks to the reviewer for his/her comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study uses a novel experimental design to elegantly demonstrate how we exploit stimulus structure to overcome working memory capacity limits. While the behavioural evidence is convincing, the neural evidence is incomplete, as it only provides partial support for the proposed information compression mechanism. This study will be of interest to cognitive neuroscientists studying structure learning and memory.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Huang and Luo investigated whether regularities between stimulus features can be exploited to facilitate the encoding of each set of stimuli in visual working memory, improving performance. They recorded both behavioural and neural (EEG) data from human participants during a sequential delayed response task involving three items with two properties: location and colour. In the key condition ('aligned trajectory'), the distance between locations of successively presented stimuli was identical to their 'distance' in colour space, permitting a compression strategy of encoding only the location and colour of the first stimulus and the relative distance of the second and third stimulus (as opposed to remembering 3 locations and 3 colours, this would only require remembering 1 location, 1 colour, and 2 distances). Participants recalled the location and colour of each item after a delay.

      Consistent with the compression account, participants' location and colour recall errors were correlated and were overall lower compared to a non-compressible condition ('misaligned trajectory'). Multivariate analysis of the neural data permitted decoding of the locations and colours during encoding. Crucially, the relative distance could also be decoded - a necessary ingredient for the compression strategy.

      Strengths:

      The main strength of this study is a novel experimental design that elegantly demonstrates how we exploit stimulus structure to overcome working memory capacity limits. The behavioural results are robust and support the main hypothesis of compressed encoding across a number of analyses. The simple and well-controlled design is suited to neuroimaging studies and paves the way for investigating the neural basis of how environmental structure is detected and represented in memory. Prior studies on this topic have primarily studied behaviour only (e.g., Brady & Tenenbaum, 2013).

      Thanks for the positive comments and excellent summary.

      Weaknesses:

      The main weakness of the study is that the EEG results do not make a clear case for compression or demonstrate its neural basis. If the main aim of this strategy is to improve memory maintenance, it seems that it should be employed during the encoding phase. From then on, the neural representation in memory should be in the compressed format. The only positive evidence for this occurs in the late encoding phase (the re-activation of decoding of the distance between items 1 and 2, Fig. 5A), but the link to behaviour seems fairly weak (p=0.068).

      Thanks for raising this important concern. The reviewer is correct that in principle subjects should employ the compression strategy during the encoding phase when sequence stimuli are presented, yet our results show that the 1-2 trajectory could only be decoded during the late encoding phase.

      Meanwhile, subjects could not get enough information to form the compressed strategy for the location and color sequences until the appearance of the 3rd item. Specifically, based on the first two items, the 1st and 2nd item, they only learn whether the 1st-2nd trajectories are congruent between location and color features. However, they could not predict whether it would also apply to the incoming 2nd-3rd trajectory. This is exactly what we found in neural decoding results. The 1st-2nd trajectory could be decoded after the 2nd item presentation, and the 2nd-3rd trajectory appears after the 3rd item onset. Most critically, the 1st-2nd trajectory is reactivated after the 3rd item but only for alignment condition, implicating formation of the full-sequence compression strategy wherein the previously formed 1st-2nd trajectory is reactivated to be connected to the 2nd-3rd trajectory.

      Regarding the difference between higher- and lower-correlation groups, previously we used the time window based on the overall 2nd-3rd neural reactivations, which might not be sensitive to reactivation strength. We now re-chose the time window based on the higher-correlation group (bootstrap test, p = 0.037, two sides).

      Results have been updated (Figure 5; Results, Page 16). Interpretations about the formation of compression strategy during encoding phase have been added to Results (Page 15-16) and Discussion (Page 18).

      Stronger evidence would be showing decoding of the compressed code during memory maintenance or recall, but this is not presented. On the contrary, during location recall (after the majority of memory maintenance is already over), colour decoding re-emerges, but in the un-compressed item-by-item code (Fig. 4B). The authors suggest that compression is consolidated at this point, but its utility at this late stage is not obvious.

      Thank you for the important question we apologize for omitting previously - neural evidence for the compressive account.

      The reason we did not perform neural decoding during maintenance is that previous EEG/MEG studies including our own failed to reveal robust and sustained time-resolved memory decoding during this period. This is posited to arise from “activity-silent” WM states, wherein memories are not necessarily retained in sustained firing but silently stored within connection weights of WM networks (Stokes, Trends Cogn. Sci., 2015; Rose, Curr Dir Psychol Sci, 2020). Our previous work showed that by transiently perturbing the 'activity-silent' WM using a retrocue or neutral impulse, memories could be reactivated and robustly decoded from neural activities (Huang et al., eLife, 2021). However, due to the lack of transient events during retention in the current design, we do not expect robust decoding results during maintenance. As shown below (AB), this is indeed what we have observed, i.e., no robust neural decoding of trajectories during retention.

      We further used alpha-band (8-11 Hz) neural activities, which have been shown to carry WM information (de Vries et al., Trends Cogn. Sci, 2020; Foster et al., Curr. Biol, 2016; Fukuda et al., J. Neurophysiol, 2016; Sutterer et al., PLOS Biol., 2019) to perform decoding analysis of compression trajectories during maintenance. As shown below, the alpha-band decoding results are indeed stronger than raw activities. Importantly, as shown below (CD), the aligned condition indeed showed significant and long-lasting decoding of compression trajectories (1st-2nd, 2nd-3rd) during retention, while the misaligned condition only showed decoding at the beginning (GH), which might be due to the non-specific offset response of the 3rd item. The results, although not as clear as those during encoding and recalling periods, support the reviewer’s hypothesis that the compressive strategy, if exploited, would be demonstrated during both encoding and maintenance periods. New results and related discussion have been added (Page 16, Supplementary Figure 4).

      With regards to the observed item-by-item color replay during location recall, the reviewer was concerned that this was not consistent with the compressive account, given the lack of trajectory decoding.

      First, item sequences stored in compressive formats need to be converted to sequences during serial recall. In other words, even though color and location sequences are retained in a compressive format (i.e., common 1st-2nd, 2nd-3rd trajectories) throughout the encoding and retention phases, they should be transferred to two sequences as outputs. This is exactly why we performed decoding analysis on individual color and location items rather than trajectories.

      Second and most importantly, we observed serial replay of color sequences when recalling locations. In our view, these results constitute strong evidence for common structure, since the spontaneous color replay during location recall for aligned condition highlights the close bound between color and location sequences stored in WM. In fact, item-by-item serial replay has been well acknowledged as a critical neural index of cognitive maps, not only for spatial navigation but also for higher-order tasks (e.g., Liu et al., Cell, 2019; Liu et al., Science, 2021). Therefore, spontaneous color sequence replay during location sequence recall supports their shared underlying cognitive map.

      Finally, spontaneous serial replay is also correlated with the reactivation of compressive trajectories during encoding (Supplementary Figure 3). This further indicates that serial replay during recalling is associated with memory reorganization formed during encoding.

      Taken together, we posit that memories need to be converted to sequences as outputs, which leads to serial reactivations during recalling. Importantly, the observed spontaneous replay of color sequences for the aligned condition provides strong evidence supporting the associations between color and location sequences in WM.

      We have now added relevant interpretations and discussions (Page 11&13).

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors wanted to test if using a shared relational structure by a sequence of colors in locations can be leveraged to reorganize and compress information.

      Strength:

      They applied machine learning to EEG data to decode the neural mechanism of reinstatement of visual stimuli at recall. They were able to show that when the location of colors is congruent with the semantically expected location (for example, green is closer to blue-green than purple) the related color information is reinstated at the probed location. This reinstatement was not present when the location and color were not semantically congruent (meaning that x displacement in color ring location did not displace colors in the color space to the same extent) and semantic knowledge of color relationship could not be used for reducing the working memory load or to benefit encoding and retrieval in short term memory.

      Weakness:

      The experiment and results did not address any reorganization of information or neural mechanism of working memory (that would be during the gap between encoding and retrieval).

      We apologize for not presenting clear neural evidence for memory reorganization, particularly neural decoding during WM maintenance and retrieval, in the previous version. As below, we explain why the findings provide converging neural evidence for WM reorganization based on a shared cognitive map.

      First, during the encoding phase when location and color sequences are serially presented, our results reveal reactivation of the 1st-2nd trajectories upon the onset of the 3rd item when location and color sequences are aligned with each other. The reactivation of 1st-2nd trajectory right after the emergence of 2nd-3rd trajectory for aligned but not for misaligned sequences strongly supports WM reorganization, since only stimulus sequences that could be compressed based on shared trajectories (aligned condition) show the co-occurrence of 1st-2nd and 2nd-3rd trajectories. Moreover, the relevance of 1st-2nd reactivation to behavioral measurements of color-location reorganization (i.e., behavioral trajectory correlation, Figure 5D) further indicates its link to WM reorganization.

      Second, the reason we originally did not perform neural decoding during maintenance is that previous EEG/MEG studies including our own failed to reveal robust and sustained time-resolved memory decoding during this period. This is posited to arise from “activity-silent” WM states, wherein memories are not necessarily retained in sustained firing but silently stored within connection weights of WM networks (Stokes, Trends Cogn. Sci., 2015; Wolff et al., Nat. Neurosci, 2017; Rose et al., Curr Dir Psychol Sci, 2020). Our previous work showed that by transiently perturbing the 'activity-silent' WM using a retrocue or neutral impulse, memories could be reactivated and robustly decoded from neural activities (Huang et al., eLife, 2021). However, due to the lack of transient events during retention in the current design, we do not expect robust decoding results during maintenance. As shown in Supplementary Figure 4(AB), this is indeed what we have observed, i.e., no robust neural decoding of trajectories during retention.

      We then used alpha-band (8-11 Hz) neural activities, which have been found to carry WM information (de Vries et al., Trends Cogn. Sci, 2020; Foster et al., Curr. Biol, 2016; Fukuda et al., J. Neurophysiol, 2016; Sutterer et al., PLOS Biol., 2019) to perform decoding analysis of compression trajectories during maintenance. As shown below, the alpha-band decoding results are indeed stronger than raw activities. Importantly, as shown in Supplementary Figure 4(CD), the aligned condition indeed showed significant and long-lasting decoding of compression trajectories (1st-2nd, 2nd-3rd) during retention, while the misaligned condition only showed decoding at the beginning (GH), which might be due to the non-specific offset response of the 3rd item. The results, although not as clear as those during encoding and recalling periods, thus also support WM reorganization.

      Finally, during the recalling period, we observed automatic serial replay of color sequences when recalling locations. In our view, these results constitute strong evidence for common structure, since the spontaneous color replay during location recall for aligned condition highlights the close bound between color and location sequences stored in WM. In fact, item-by-item serial replay has been well acknowledged as a critical neural index of cognitive maps, not only for spatial navigation but also for higher-order tasks (e.g., Liu et al., Cell, 2019; Liu et al., Science, 2021). Therefore, spontaneous replay of color sequence during location recall supports their shared underlying cognitive map. Moreover, the spontaneous serial replay is correlated with the reactivation of compressive trajectories during encoding (Supplementary Figure 3). This further indicates that serial replay during recalling is associated with memory reorganization formed during encoding.

      Taken together, we have added updated results about the maintenance period (Page 16, Supplementary Figure 4) and included clarifications and interpretations about why the findings during the encoding and retrieval periods support the WM reorganization view (Page 15-16).

      There was also a lack of evidence to rule out that the current observation can be addressed by schematic abstraction instead of the utilization of a cognitive map.

      The likely impact of the initial submission of the study would be in the utility of the methods that would be helpful for studying a sequence of stimuli at recall. The paper was discussed in a narrow and focused context, referring to limited studies on cognitive maps and replay. The bigger picture and long history of studying encoding and retrieval of schema-congruent and schema-incongruent events is not discussed.

      We agree with the reviewer that cognitive map referred here could be understood as schematic abstraction. Cognitive map refers to the internal representation of spatial relations in a specific environment (Tolman 1948). Schematic abstraction denotes a more broad range of circumstances, whereby the gist or structure of multiple environments or episodes can be integrated (Bartlett, 1932; Farzanfar et al., Nat. Rev. Neurosci, 2023).

      In other words, schema refers to highly abstract framework of prior knowledge that captures common patterns across related experiences, which does not necessarily occur in a spatial framework as cognitive maps do. Meanwhile, in the current design, we specifically manipulate the consistency of spatial trajectory distance between color and location sequences. Therefore, we would argue that cognitive map is a more conservative and appropriate term to frame our findings.

      Relevant discussions have been added (Page 3&19).

      We apologize for the lack of more generalized discussion and have added schema-related literatures. Thanks for the suggestion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Do time-frequency-domain data (e.g., alpha-band power) in the delay provide evidence for delay-period decoding of trajectory lengths? This might strengthen the case for compression.

      Thanks for the suggestion. We now performed decoding analysis of the delay period based on alpha-band power. As shown in supplementary figure 4, both the 1st-2nd and 2nd-3rd trajectories could be decoded for the aligned condition.

      Added in supplementary figure 4 and Page 16.  

      (2) Do participants erroneously apply the compression strategy in the misaligned condition? This would not show up in the trajectory error correlation analysis, but might be visible when examining correlations between raw trajectory lengths.

      Thanks for raising this interesting suggestion. To test the hypothesis, we chose a typical misaligned condition where 1st-2nd trajectory distances are same between location and color sequences, while the 2nd-3rd trajectory distances are different between the two features.

      In this case, participants might exploit the compression strategy for the first two items and erroneously apply the strategy to the 3rd item. If so, we would expect better memory performance for the first two items but worse memory for the 3rd item, compared to the rest of misaligned trials. As shown below, the 1st-2nd aligned trials showed marginally significant higher performance than misaligned trials for the first two items (t(32) = 1.907, p = 0.066, Cohen’s d = 0.332) . Unfortunately, we did not find significant worse performance for the 3rd item between the two conditions (t(32) = -0.4847, p = 0.631, Cohen’s d = -0.084). We observed significant interactions between the last two items and the alignment effect (t(32) = 2.082, p = 0.045, Cohen’s d = 0.362), indicating a trend of applying wrong compression strategy to the 3nd item.

      Author response image 1.

      (3a) Some more detail on some of the methods might help readers. For instance, did trajectories always move in a clockwise direction? Could the direction reverse on the third item? If not, did this induce a response bias? Could such a bias possibly account for the trajectory error correlations

      Sorry for the unclear statement. For individual trial, both the color and location features of the three items are randomly selected from nine possible values without any constraint about the directions. That is to say, the trajectories can move in a clockwise or anticlockwise direction, and the direction can also reverse on the third item in some trials. Thus, we think the current design can actually help us to reduce the influence of response bias. Taking a step back, if trajectory error correlations are due to response bias, we should expect consistent significant correlation for all conditions, instead of only observing significant correlation for 1st-2nd and 2nd-3rd trajectories but not for 1st-3rd trajectory and only in aligned trajectory condition but not in misaligned condition. Therefore, we think the trajectory error correlations cannot be simply explained by response bias.

      Details have been added (Page 23).

      (3b) Is the colour wheel always oriented the same way for a participant? If so, given there are only nine colors, it seems possible that colors are mapped to locations and remembered in a location code instead. This does not seem to be a problem in principle for the behavioural findings, but might change the interpretation of what is being decoded from the EEG. If this is a possibility then this might be acknowledged.

      The color wheel is always oriented the same way for each participant. We agree with the reviewer that it is possible that participants tend to map colors to locations and remembered in a location code. We don’t have sufficient evidence to rule out this possibility. One possible way could be running another experiment with varied color wheel during response period. Meanwhile, we would like to point out that the underlying logic of the current design is based on the facts that thinking spatially is intuitive and spatial metaphors like “location” and “distance” is commonly used to describe world, e.g., the well-known mental number line (Dehaene et al., JEP: General, 1993). Therefore, we expected participants to associate or integrate location and color maps based on trajectory distance.

      The reviewer is correct that the color decoding would reflect spatial location rather than the genuine color feature. This is actually the point of the experimental design, whereby two irrelevant features could be possibly combined within a common cognitive map. Without the realignment of the two feature maps defined in space, subjects could not at all form the strategy to compress the two sequences. In other words, decoding of color sequences could be understood as neural representation of a series of corresponding locations along the ring that are independent of the physical locations of the items.

      Interpretations and clarifications have been added (Page 23&26).

      (4) Does the discretisation of the stimulus distribution (to only 9 possible locations) make the compression strategy easier to use? If the features had been continuously distributed across the location/colour circle, would participants still pick up on and use the shared trajectory structure?

      Thanks for the question. Without further data, it’s hard to say whether the discretization of the stimulus distribution would make the compression strategy easier to use or not, compared to continuous distribution. Both outcomes seem possible. On the one hand, discrete stimulus distribution would result in discrete trajectory distribution, which helps participants to realize the common trajectory strategy. On the other hand, discrete stimulus distribution would result in category or label representation, which may weaken the effectiveness of structure compression strategy. We postulate that our findings could be generalized to continuous trajectories in a cognitive map within certain resolution.

      (5a) Minor point: I disagree that avoiding the same points for location and colour for a given item allows them to be independently decoded. I would argue the contrary - this kind of constraint should create a small anti-correlation that in principle could lead to spurious decoding of one variable (although this seems unlikely here).

      We appreciate the concern. As mentioned above, with discrete stimulus distribution (9 possible values for both color and location domains), it is quite possible that a fraction of trials would share same values in location and color. Therefore, the neural decoding for one domain might be confounded by another domain. To dissociate their neural representations, we imposed constraints that color and location could not occupy the same value for a given item.

      We agree that this kind of constraint might create a small anti-correlation, even though it is not observed here. Future studies using continuous stimulus distribution would reduce the correlation or anti-correlation between stimuli.

      (5b) Very minor point: 1,000 permutations for significance testing seems on the low side. Since some of the p-values are close to 0.05 it may be worth running more permutations.

      Thanks for this suggestion. We got similar results using 1000 or 10000 permutations.

      (6) Missing reference: H. H. Li et al., 2021 (line 213) seems not to be on the list of references.

      Sorry for the mistake. Added.

      Reviewer #2 (Recommendations For The Authors):

      The study aimed to discuss the working memory mechanism, instead, it seems to be focused on the encoding and recall strategies after a short while, I recommend updating the manuscript to refer to the relevant cognitive mechanism.

      There was a strong voice on the effect of using the cognitive map in working memory, without any tests on if indeed a cognitive map was used (for example the novel link between stimuli and how a cognitive map can be used to infer shortcuts). Was the participant required to have any mental map beyond the schema of the shown color ring?

      In the current experiment, to discuss if the effect is driven by utilizing a cognitive map or schematic abstraction of color-relatedness, further analysis is required to possibly assess the effects of schema on neural activity and behavior. Namely,<br /> (1) Was there any reinstatement of schematically congruent (expected) colors that were probed by location 1, at locations 2 and 3 in the MAT condition?

      Thanks for pointing out this possibility. However, we don’t think there will be stable color expectations given location information under the MAT condition. First, as the trajectory distance varied on a trial-by-trial basis, no prior common trajectory knowledge could be used to make inference about the current stimuli in individual trial. Second, the starting points for color and location (1st item) were randomly and independently selected, such that color sequence could not be predicted based on the location sequence for both aligned and misaligned conditions.

      (2) Given that response time can be a behavioral marker of schematic conflict, was the response time faster for congruent than incongruent conditions?

      Thanks for this question. Unfortunately, due to the experimental design, the response time could not be used as a behavioral marker to infer mental conflicts, since participants were not required to respond as fast as possible. Instead, they took their own pace to reproduce sequences without time limit. They could even take a short break before submitting their response to initiate the next trial.

      (3) In case you cannot rule out that utilizing schema is the cognitive mechanism that supports working memory performance (the behavior), please add the classical literature (on the memory of schematically congruent and incongruent events) to the discussion.

      Thanks for this suggestion and we have added relevant literatures now (Page 3&19).

      (4) On page 6, 'common structure in the cognitive map' is the schema, isn't it?

      Correct. Based on our understanding, ‘common structure in the cognitive map’ is a spatial schema.

      (5) In Figure 2 EFG, would you please use a mixed effect model or show evidence that all participants demonstrated a correlation between the location trajectory error and color trajectory error?

      Thanks for the suggestion. We have added the mixed effect model results, which are consistent with Figure 2EFG (AT: 1st-2nd trajectory, β = 0.071, t = 4.215, p < 0.001; 2nd-3rd trajectory, β = 0.077, t = 3.570, p < 0.001; 1st-3rd trajectory, β = 0.019, t = 1.118, p = 0.264; MAT: 1st-2nd trajectory, β = 0.031, t = 1.572, p = 0.116; 2nd-3rd trajectory, β = 0.002, t = 0.128 , p = 0.898; 1st-3rd trajectory, β = -0.017, t = -1.024, p = 0.306).

      In general, doesn't such correlation just show that good participants/trials were good (some did well in the study and some did poorly throughout?)

      We don’t think the trajectory error correlation results just reveal that some participants did well and some participants did poorly. If that is the case, we shouldn’t observe significant correlation in Figure 2D, where we first run correlation for each participant and then test correlation significance at group level. Indeed, trajectory error correlation between color and location domains characterizes the consistent changes between the two domains.

      It is worth to note that the correlation was estimated with signed trajectory errors in color and location domains, which meant that we indeed cared about whether the errors in the two domains were consistently varied in the same direction, i.e., whether longer trajectory memory compared to the actual trajectory in location domain would predict longer trajectory memory in color domain.

      Moreover, as shown in Figure 2EFG, by dividing trials into 4 bins according to the location trajectory error for each participant and pooling the data across participants, we observed 4 clusters along x-axis (location trajectory error). This suggests that participants’ memory performance is rather consistent instead of being extremely good or bad. Besides, if trajectory error correlation is due to different overall memory performance between participants, we should observe significant trajectory error correlations both in AT and MAT conditions, instead of only under AT condition and for 1st-2nd and 2nd-3rd trajectories but not for 1st-3rd trajectory.

      In Figure 2 G, is the marginal error just too big to be sensitive? I am not sure what we are learning here, please clarify.

      Sorry for the confusion. To examine this possibility, we excluded errors which are beyond 2.5 * σ, and still observed non-significant 1st-3rd trajectory error correlation between color and location domains (r = 0.119, p = 0.167).

      The 1st-3rd trajectory showed nonsignificant behavioral correlation and neural representation, which suggests that the current sequential memory task would encourage participants to organize all information by relying more on the adjacent items and their distance. Thus, we think the 1st-3rd trajectory would serve as a control trajectory, which helps us not only exclude other possible explanation (e.g., systematic response bias), but also validate current findings both in behavioral and neural level.

      Results and statements (Page 10-11) added now.

      Author response image 2.

      (6) Regarding the first lines on page 11, did you do qualitative research to know if less information was encoded in congruent conditions?

      The current experimental design is inspired by the mental compression of spatial sequence studies from Dehaene’s lab (Amalric er al., 2017; Roumi et al., 2021), in which they propose that human brain compresses spatial sequence using an abstract language and formalize minimal description length of a sequence as the “language-of-thought complexity.” Based on this evidence, we think less information is required to describe congruent condition compared to incongruent condition. This idea is supported by better memory performance for congruent condition. Unfortunately, we couldn’t manage to quantify how less information was encoded in congruent condition.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors intended to prove that gut GLP-1 expression and secretion can be regulated by Piezo1, and hence by mechanistic/stretching regulation. For this purpose, they have assessed Piezo1 expression in STC-1 cell line (a mouse GLP-1 producing cell line) and mouse gut, showing the correlation between Piezo1 level and Gcg levels (Figure S1). They then aimed to generate gut L cell-specific Piezo1 KO mice, and claimed the mice show impaired glucose tolerance and GLP-1 production, which can be mitigated by Ex-4 treatment (Figures 1-2). Pharmacological agents (Yoda1 and GsMTx4) and mechanic activation (intestinal bead implantation) were then utilized to prove the existence of ileal Piezo1-regulated GLP-1 synthesis (Figure 3). This was followed by testing such mechanism in a limited amount of primary L cells and mainly in the STC-1 cell line (Figures 4-7).

      While the novelty of the study is somehow appreciable, the bio-medical significance is not well demonstrated in the manuscript. The authors stated (in lines between lines 78-83) a number of potential side effects of GLP-1 analogs, how can the mechanistic study of GLP-1 production on its own be essential for the development of new drug targets for the treatment of diabetes. Furthermore, the study does not provide a clear mechanistic insight on how the claimed CaMKKbeta/CaMKIV-mTORC1 signaling pathway upregulated both GLP-1 production and secretion. This reviewer also has concerns about the experimental design and data presented in the current manuscript, including the issue of how proglucagon expression can be assessed by Western blotting.

      Strengths:

      The novelty of the concept.

      Weaknesses:

      Experimental design and key experiment information.

      Current GLP-1-based therapies for diabetes use GLP-1 agonists/analogs. Although generally safe, there are some side effect or risks of GLP-1 agonists/analogs. We agree to the reviewer that a mechanistic study on the regulation of GLP-1 production will not directly lead to development of new drug targets for the treatment of diabetes. However, understanding the mechanism of GLP-1 production may shed light onto alternative treatment strategies for diabetes that targeting the production of GLP-1. In our previous studies, we have elucidated the role of mTOR/S6K pathway in regulating GLP-1 production in L cells. Using STC-1 cell line and different mouse models, including Neurog3-Tsc1−/− mice, rapamycin or L-lucine treatment to stimulate mTOR activity, we have demonstrated that mTOR stimulates proglucagon gene expression and thus GLP-1 production (Diabetologia 2015;58(8):1887-97; Mol Cell Endocrinol. 2015 Nov 15:416:9-18.). Based on our previous studies, we found that Piezo1 regulated mTOR/S6K pathway and thus proglucagon expression and GLP-1 production through Ca2+/CaMKKbeta/CaMKIV in our present study. Although we could not exclude involvement of other signaling pathways downstream of Piezo1 in regulating the cleavage of proglucagon, granule maturation and the final release of GLP-1, our present study provided evidence to support the involvement of the Ca2+/CaMKKbeta/CaMKIV/mTOR pathway in mediating the role Piezo1 in proglucagon expression and GLP-1 production. The reviewer also expressed concerns on the use of western blot to detect proglucagon expression. In fact, western blot is often used in detection of proglucagon. Here are some examples from other researchers: Diabetes. 2013 Mar;62(3):789-800. Gastroenterology. 2011 May;140(5):1564-74. 2004 Jul 23;279(30):31068-75. The proglucagon antibody we used in our study was purchased from abcam (Cat#ab23468), which can detect proglucagon of 21 kDa.

      Reviewer #2 (Public Review):

      Summary:

      The study by Huang and colleagues focuses on GLP-1 producing entero-endocrine (EEC) L-cells and their regulation of GLP-1 production by a mechano-gated ion channel Piezo1. The study describes Piezo1 expression by L-cells and uses an exciting intersectional mouse model (villin to target epithelium and Gcg to target GLP-1-producing cells and others like glucagon-producing pancreatic endocrine cells), which allows L-cell specific Piezo1 knockout. Using this model, they find an impairment of glucose tolerance, increased body weight, reduced GLP-1 content, and changes to the CaMKKbeta-CaMKIV-mTORC1 signaling pathway using a normal diet and then high-fat diet. Piezo1 chemical agonist and intestinal bead implantation reversed these changes and improved the disrupted phenotype. Using primary sorted L-cells and cell model STC-1, they found that stretch and Piezo1 activation increased GLP-1 and altered the molecular changes described above.

      Strengths:

      This is an interesting study testing a novel hypothesis that may have important mechanistic and translational implications. The authors generated an important intersectional genetics mouse model that allowed them to target Piezo1 L-cells specifically, and the surprising result of impaired metabolism is intriguing.

      Weaknesses:

      However, there are several critical limitations that require resolution before making the conclusions that the authors make.

      (1) A potential explanation for the data, and one that is consistent with existing literature [see for example, PMC5334365, PMC4593481], is that epithelial Piezo1, which is broadly expressed by the GI epithelium, impacts epithelial cell density and survival, and as such, if Piezo1 is involved in L-cell physiology, it may be through regulation of cell density. Thus, it is critical to determine L-cell densities and epithelial integrity in controls and Piezo1 knockouts systematically across the length of the gut, since the authors do not make it clear which gut region contributes to the phenotype they see. Current immunohistochemistry data are not convincing.

      We appreciate the reviewer’s comment. We agree that Piezo1 may affect L-cell density and epithelial integrity. We will do quantification of L-cell density and test the epithelial integrity by examining the expression of tight junction proteins (ZO-1 and Occludin) and determine the transepithelial resistance in different regions of the gut

      (2) Calcium signaling in L-cells is implicated in their typical role of being gut chemo-sensors, and Piezo1 is a calcium channel, so it is not clear whether any calcium-related signaling mechanism would phenocopy these results.

      We will examine whether other calcium-related signaling mechanism also contribute the phenotype seen in the IntL-Piezo1-/- mice.

      (3) Intestinal bead implantation, while intriguing, does not have clear mechanisms - and is likely to provide a point of intestinal obstruction and dysmotility.

      To ascertain if intestinal bead implantation led to intestinal obstruction and dysmotility, we conducted a bowel transit time test. The results revealed no difference in bowel transit time between the sham-operated mice and those implanted with beads.

      (4) Previous studies, some that are very important, but not cited, contradict the presented results (e.g., epithelial Piezo1 role in insulin secretion) and require reconciliation.

      Overall, this study makes an interesting observation but the data are not currently strong enough to support the conclusions.

      We will cite more previous studies on GLP-1 production and discuss the discrepancy between our study and others’ studies. The lack of changes in blood glucose seen in Villin-Piezo1-/- mice reported by Sugisawa et. al. is not surprising (Cell. 2020 Aug 6;182(3):609-624.e21.). Actually, in another recent study from our group, we found similar results when the Villin-Piezo1-/- mice Piezo1fl/fl control mice were fed with normal chow diet. Since Villin-1 is expressed in all the epithelial cells of the gut, including enterocytes and various types of endocrine cells, the effect of L-cell Piezo1 loss may be masked by other cell types under normal condition. However, impair glucose tolerance was seen in Villin-Piezo1-/- mice compared to the Piezo1fl/fl control mice after high fat diet for 8 weeks. We further found that Piezo1 in enterocytes exerted a negative effect on the glucose and lipid absorption. Loss of Piezo1 in enterocytes led to over-absorption of nutrients under high-fat diet (Tian Tao, Qing Shu, Yawen Zhao, Wenying Guo, Jinting Wang, Yuhao Shi, Shiqi Jia, Hening Zhai, Hui Chen, Cunchuan Wang*, Geyang Xu*, Mechanical regulation of lipid and sugar absorption by Piezo1 in enterocytes, Acta Pharmaceutica Sinica B , Accepted, 2024,https://doi.org/10.1016/j.apsb.2024.04.016).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Your editorial guidance, reviews, and suggestions have led us to make substantial changes to our manuscript. While we detail point-by-point responses in typical fashion below, I wanted to outline, at a high level, what we’ve done.

      (1) Methods. Your suggestions led us to rethink our presentation of our methods, which are now described more cohesively in a new methods section in the main text.

      (2) Model Validation & Robustness. Reviewers suggested various validations and checks to ensure that our findings were not, for instance, the consequence of a particular choice of parameter. These can be found in the supplementary materials.

      (3) Data Cleaning & Inclusion/Exclusion. Finally, based on feedback, our new methods section fully describes the process by which we cleaned our original data, and on what grounds we included/excluded individual faculty records from analysis.

      eLife assessment

      Efforts to increase the representation of women in academia have focussed on efforts to recruit more women and to reduce the attrition of women. This study - which is based on analyses of data on more than 250,000 tenured and tenure-track faculty from the period 2011-2020, and the predictions of counterfactual models - shows that hiring more women has a bigger impact than reducing attrition. The study is an important contribution to work on gender representation in academia, and while the evidence in support of the findings is solid, the description of the methods used is in need of improvement.

      Reviewer #1 (Public Review):

      Summary and strengths

      This is an interesting paper that concludes that hiring more women will do more to improve the gender balance of (US) academia than improving the attrition rates of women (which are usually higher than men's). Other groups have reported similar findings but this study uses a larger than usual dataset that spans many fields and institutions, so it is a good contribution to the field.

      We thank the reviewer for their positive assessment of the contributions of our work.

      Weaknesses

      The paper uses a mixture of mathematical models (basically Leslie matrices, though that term isn't mentioned here) parameterised using statistical models fitted to data. However, the description of the methods needs to be improved significantly. The author should consider citing Matrix Population Models by Caswell (Second Edition; 2006; OUP) as a general introduction to these methods, and consider citing some or all of the following as examples of similar studies performed with these models:

      Shaw and Stanton. 2012. Proc Roy Soc B 279:3736-3741

      Brower and James. 2020. PLOS One 15:e0226392

      James and Brower. 2022. Royal Society Open Science 9:220785 Lawrence and Chen. 2015.

      [http://128.97.186.17/index.php/pwp/article/view/PWP-CCPR-2015-008]

      Danell and Hjerm. 2013. Scientometrics 94:999-1006

      We have expanded the description of methods in a new methods section of the paper which we hope will address the reviewer’s concerns.

      We agree that our model of faculty hiring and attrition resembles Leslie matrices. In results section B, we now mention Leslie matrices and cite Matrix Population Models by Caswell, noting a few key differences between Leslie matrices and the model of hiring and attrition presented in this work. Most notably, in the hiring and attrition model presented, the number of new hires is not based on per-capita fertility constants. Instead, population sizes are predetermined fixed values for each year, precluding exponential population growth or decay towards 0 that is commonly observed in the asymptotic behavior of linear Leslie Matrix models.

      We have additionally revised the main text to cite the listed examples of similar studies (we had already cited James and Brower, 2022). We thank the reviewer for bringing these relevant works to our attention.

      The analysis also runs the risk of conflating the fraction of women in a field with gender diversity! In female-dominated fields (e.g. Nursing, Education) increasing the proportion of women in the field will lead to reduced gender diversity. This does not seem to be accounted for in the analysis. It would also be helpful to state the number of men and women in each of the 111 fields in the study.

      We have carefully examined the manuscript and revised the text to correctly differentiate between gender diversity and women’s representation.

      We have additionally added a table to the supplemental materials (Tab. S3) that reports the estimated number of men and women in each of the 111 fields.

      Reviewer #2 (Public Review):

      Summary:

      This important study by LaBerge and co-authors seeks to understand the causal drivers of faculty gender demographics by quantifying the relative importance of faculty hiring and attrition across fields. They leverage historical data to describe past trends and develop models that project future scenarios that test the efficacy of targeted interventions. Overall, I found this study to be a compelling and important analysis of gendered hiring and attrition in US institutions, and one that has wide-reaching policy implications for the academy. The authors have also suggested a number of fruitful future avenues for research that will allow for additional clarity in understanding the gendered, racial, and socioeconomic disparities present in US hiring and attrition, and potential strategies for mitigating or eliminating these disparities.

      We thank the reviewer for their positive assessment of the contributions of our work.

      Strengths:

      In this study, LaBerge et al use data from over 268,000 tenured and tenure-track faculty from over 100 fields at more than 12,000 PhD-granting institutions in the US. The period they examine covers 2011-2020. Their analysis provides a large-scale overview of demographics across fields, a unique strength that allows the authors to find statistically significant effects for gendered attrition and hiring across broad areas (STEM, non-STEM, and topical domains).

      LaBerge et al. find gendered disparities in attrition-using both empirical data and their counterfactual model-that account for the loss of 1378 women faculty across all fields between 2011 and 2020. It is true that "this number is both a small portion of academia... and a staggering number of individual careers," as ." - as this loss of women faculty is comparable to losing more than 70 entire departments. I appreciate the authors' discussion about these losses-they note that each of these is likely unnecessary, as women often report feeling that they were pushed out of academic jobs.

      LaBerge et al. also find-by developing a number of model scenarios testing the impacts of hiring, attrition, or both-that hiring has a greater impact on women's representation in the majority of academic fields in spite of higher attrition rates for women faculty relative to men at every career stage. Unlike many other studies of historical trends in gender diversity, which have often been limited to institution-specific analyses, they provide an analysis that spans over 100 fields and includes nearly all US PhD-granting institutions. They are able to project the impacts of strategies focusing on hiring or retention using models that project the impact of altering attrition risk or hiring success for women. With this approach, they show that even relatively modest annual changes in hiring accumulate over time to help improve the diversity of a given field. They also demonstrate that, across the model scenarios they employ, changes to hiring drive the largest improvement in the long-term gender diversity of a field.

      Future work will hopefully - as the authors point out - include intersectional analyses to determine whether a disproportionate share of lost gender diversity is due to the loss of women of color from the professoriate. I appreciate the author's discussion of the racial demographics of women in the professoriate, and their note that "the majority of women faculty in the US are white" and thus that the patterns observed in this study are predominately driven by this demographic. I also highly appreciate their final note that "equal representation is not equivalent to equal or fair treatment," and that diversifying hiring without mitigating the underlying cause of inequity will continue to contribute to higher losses of women faculty.

      Weaknesses

      First, and perhaps most importantly, it would be beneficial to include a distinct methods section. While the authors have woven the methods into the results section, I found that I needed to dig to find the answers to my questions about methods. I would also have appreciated additional information within the main text on the source of the data, specifics about its collection, inclusion and exclusion criteria for the present study, and other information on how the final dataset was produced. This - and additional information as the authors and editor see fit - would be helpful to readers hoping to understand some of the nuance behind the collection, curation, and analysis of this important dataset.

      We have expanded upon the description of methods in a new methods section of the paper.

      We have also added a detailed description of the data cleaning steps taken to produce the dataset used in these analyses, including the inclusion/exclusion criteria applied. This detailed description is at the beginning of the methods section. This addition has substantially enhanced the transparency of our data cleaning methods, so we thank the reviewer for this suggestion.

      I would also encourage the authors to include a note about binary gender classifications in the discussion section. In particular, I encourage them to include an explicit acknowledgement that the trends assessed in the present study are focused solely on two binary genders - and do not include an analysis of nonbinary, genderqueer, or other "third gender" individuals. While this is likely because of the limitations of the dataset utilized, the focus of this study on binary genders means that it does not reflect the true diversity of gender identities represented within the professoriate.

      In a similar vein, additional context on how gender was assigned on the basis of names should be added to the methods section.

      We use a free, open-source, and open-data python package called nomquamgender (Van Buskirk et al, 2023) to estimate the strengths of (culturally constructed) name-gender associations. For sufficiently strong associations with a binary gender, we apply those labels to the names in our data. We have updated the main text to make this approach more apparent.

      We have also added language to the main text which explicitly acknowledges that our approach only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      I do think that some care might be warranted regarding the statement that "eliminating gendered attrition leads to only modest changes in field-level diversity" (Page 6). while I do not think that this is untrue, I do think that the model scenarios where hiring is "radical" and attrition is unchanged from present (equal representation of women and men among hires (ER) + observed attrition (OA)) shows that a sole focus on hiring dampens the gains that can otherwise be addressed via even modest interventions (see, e.g., gender-neutral attrition (GNA) + increasing representation of women among hires (IR)). I am curious as to why the authors did not include an additional scenario where hiring rates are equal and attrition is equalized (i.e., GNA + ER). The importance of including this additional model is highlighted in the discussion, where, on Page 7, the authors write: "In our forecasting analysis, we find that eliminating the gendered attrition gap, in isolation, would not substantially increase representation of women faculty in academia. Rather, progress towards gender parity depends far more heavily on increasing women's representation among new faculty hires, with the greatest change occurring if hiring is close to gender parity." I believe that this statement would be greatly strengthened if the authors can also include a comparison to a scenario where both hiring and attrition are addressed with "radical" interventions.

      Our rationale for omitting the GNA + ER scenario in the presented analysis is that we can reason about the outcomes of this scenario without the need for computation; if a field has equal inputs of women and men faculty (on average) and equal retention rates between women and men (on average), then, no matter the field’s initial age and gender distribution of faculty, the expected value for the percentage of women faculty after all of the prior faculty have retired (which may take 40+ years) is exactly 50%. We have updated the main text to discuss this point.

      Reviewer #3 (Public Review):

      This manuscript investigates the roles of faculty hiring and attrition in influencing gender representation in US academia. It uses a comprehensive dataset covering tenured and tenure-track faculty across various fields from 2011 to 2020. The study employs a counterfactual model to assess the impact of hypothetical gender-neutral attrition and projects future gender representation under different policy scenarios. The analysis reveals that hiring has a more significant impact on women's representation than attrition in most fields and highlights the need for sustained changes in hiring practices to achieve gender parity.

      Strengths:

      Overall, the manuscript offers significant contributions to understanding gender diversity in academia through its rigorous data analysis and innovative methodology.

      The methodology is robust, employing extensive data covering a wide range of academic fields and institutions.

      Weaknesses:

      The primary weakness of the study lies in its focus on US academia, which may limit the generalizability of its findings to other cultural and academic contexts.

      We agree that the U.S. focus of this study limits the generalizability of our findings. The findings that we present in this work will only generalize to other populations–whether it be to an alternate industry, e.g., tech workers, or to faculty in different countries–to the extent that these other populations share similar hiring patterns, retention patterns, and current demographic representation. We have added a discussion of this limitation to the manuscript.

      Additionally, the counterfactual model's reliance on specific assumptions about gender-neutral attrition could affect the accuracy of its projections.

      Our projection analysis is intended to illustrate the potential gender representation outcomes of several possible counterfactual scenarios, with each projection being conditioned on transparent and simple assumptions. In this way, the projection analysis is not intended to predict or forecast the future.

      To resolve this point for our readers, we now introduce our projections in the context of the related terms of prediction and forecast, noting that they have distinct meanings as terms of art: On one hand, prediction and forecasting involve anticipating a specific outcome based on available information and analysis, and typically rely on patterns, trends, or historical data to make educated guesses about what will happen. Projections are based on assumptions and are often presented in a panel of possible future scenarios. While predictions and forecasts aim for precision, projections (which we make in our analysis) are more generalized and may involve a range of potential outcomes.

      Additionally, the study assumes that whoever disappeared from the dataset is attrition in academia. While in reality, those attritions could be researchers who moved to another country or another institution that is not included in the AARC (Academic Analytics Research Centre) dataset.

      In our revision, we have elevated this important point, and clarified it in the context of the various ways in which we count hires and attritions. We now explicitly state that “We define faculty hiring and faculty attrition to include all cases in which faculty join or leave a field or domain within our dataset.” Then, we enumerate the number of situations that could be counted as hires and attritions, including the reviewer’s example of faculty who move to another country.

      Reviewer #1 (Recommendations For The Authors):

      Section B: The authors use an age structured Leslie matrix model (see Caswell for a good reference to these) to test the effect of making the attrition rates or hiring rates equal for men and women. My main concern here is the fitting techniques for the parameters. These are described (a little too!) briefly in section S1B. Some specific questions that are left hanging include:

      A 5th order polynomial is an interesting choice. Some statistical evidence as to why it was the best fit would be useful. What other candidate models were compared? What was the "best fit" judgement made with: AIC, r^2? What are the estimates for how good this fit is? How many data points were fitted to? Was it the best fit choice for all of the 111 fields for men and women?

      We use a logistic regression model for each field to infer faculty attrition probabilities across career ages and time, and we include the career age predictor up to its fifth power to capture the career-age correlations observed in Spoon et. al., Science Advances, 2023. For ease of reference, we reproduce the attrition risk curves in Fig S4.

      We note that faculty attrition rates start low and then reach a peak around 5-7 years after earning PhD, and then decline until around 15-20 years post-PhD, after which, attrition rates increase as faculty approach retirement.

      This function shape starts low and ends high, and includes at least one local minimum, which indicates that career age should be odd-ordered in the model and at least order-3, but only including career age up to its 3rd order term tended to miss some of the overserved career-age/attrition correlations. We evaluated the fit using 5-fold cross validation with a Brier score loss metric, and among options of polynomials of degree 1, 3, 5, or 7, we found that 5th order performed well overall on average over all fields (even if it was not the best for every field), without overfitting in fields with fewer data. Example fits, reminiscent of the figure from Spoon et al, are now provided in Figs S4 and S5.

      While the model fit with fifth order terms may not be the best fit for all 111 fields (e.g., 7th order fits better in some cases), we wanted to avoid field-specific curves that might be overfitted to the field-specific data, especially due to low sample size (and thus larger fluctuations) on the high career age side of the function. Our main text and supplement now includes justifications for our choice to include career age up to its fifth order terms.

      You used the 5th order logistic regression (bottom of page 11) to model attrition at different ages. The data in [24] shows that attrition increases sharply, then drops then increases again with career age. A fifth order polynomial on its own could plausibly do this but I associate logistic regression models like this as being monotonically increasing (or decreasing!), again more details as to how this worked would be useful.

      Our first submission did not explain this point well, but we hope that Supplementary Figures S4 and S5 provide clarity. In short, we agree of course that typical logistic regression assumes a linear relationship between the predictor variables and the log odds of the outcome variable. This means that the relationship between the predictor variables and the probability of the outcome variable follows a sigmoidal (S-shaped) curve. However, the relationship between the predictor variables and the outcome variable may not be linear.

      To capture more complex relationships, like the increasing, decreasing and then increasing attrition rates as a function of career age, higher-order terms can be added to the logistic regression model. These higher-order terms allow the model to capture nonlinear relationships between the predictor variables and the outcome variable — namely the non-monotonic relationship between rates of attrition and career age — while staying within a logistic regression framework.

      "The career age of new hires follows the average career age distribution of hires" did you use the empirical distribution here or did you fit a standard statistical distribution e.g. Gamma?

      We used the empirical distribution. This information has been added to the updated methods section in the main text.

      How did you account for institution (presumably available)? Your own work has shown that institution types plays a role which could be contributing to these results.

      See below.

      What other confounding variables could be at play here, what is available as part of the data and what happens if you do/don't account for them?

      A number of variables included in our data have been shown to correlate with faculty attrition, including PhD prestige, current institution prestige, PhD country, and whether or not an individual is a “self-hire,” i.e., trained and hired at the same institution (Wapman et. al., Nature, 2022). Additional factors that faculty self-report as reasons for leaving academia include issues of work-life balance, workplace climate, and professional reasons, and in some cases to varying degrees between men and women faculty (Spoon et. al., Sci. Adv., 2023).

      Our counterfactual analysis aims to address a specific question: how would women’s representation among faculty be different today if men and women were subjected to the same attrition patterns over the past decade? To answer this question, it is important to account for faculty career age, which we accept as a variable that will always correlate strongly with faculty attrition rates, as long as the tenure filter remains in place and faculty continue to naturally progress towards retirement age. On the other hand, it is less clear why PhD country, self-hire status, or any of the other mentioned variables should necessarily correlate with attrition rates and with gendered differences in attrition rates more specifically. While some or all of these variables may underlie the causal roots of gendered attrition rates, our analysis does not seek to answer causal questions about why faculty leave their jobs (e.g., by testing the impact of accounting for these variables in simulations per the reviewers suggestion). This is because we do not believe the data used in this analysis is sufficient to answer such questions, lacking comprehensive data on faculty stress (Spoon et. al., Sci. Adv., 2023), parenthood status, etc.

      What career age range did the model use?

      The career age range observed in model outcomes are a function of the empirically derived attrition rates for faculty across academic fields. The highest career age observed in the AARC data was 80, and the faculty career ages that result from our model simulations and projections do not exceed 80.

      We have also added the distribution of faculty across career ages for the projection scenario model outputs in the supplemental materials Fig. S3 (see response to your later comment regarding career age for further details). Looking at these distributions, it is observed that very few faculty have career age > 60, both in observation and in our simulations.

      What was the initial condition for the model?

      Empirical 2011 Faculty rosters are used as the initial conditions for the counterfactual analysis, and 2020 faculty rosters are these as the initial conditions for the projections analysis. This information has been added to the descriptions of methods in the main text.

      Starting the model in 2011 how well does it fit the available data up to 2020?

      Thank you for this suggestion. We ran this analysis for each field starting in 2011, and found that model outcomes were statistically indistinguishable from the observed 2020 faculty gender compositions for all 111 academic fields. This finding is not surprising, because the model is fit to the observed data, but it serves to validate the methods that we used to extract the model's parameters. We have added these results to the supplement (Fig. S2).

      What are the sensitivity analysis results for the model? If you have made different fitting decisions how much would the results change? All this applied to both the hiring and attrition parameters estimates.

      We model attrition and hiring using logistic regression, with career age included as an exogenous variable up to its fifth power. A natural question follows: what if we used a model with career age only to its first or third power? Or to higher powers? We performed this sensitivity analysis, and added three new figures to the supplement to present these findings:

      First, we show the observed attrition probabilities at each career age, and four model fits to attrition data (Supplementary Figs S4 and S5). The first model includes career age only to its first power, and this model clearly does not capture the full career age / attrition correlation structure. The second model includes career age to its third power, which does a better job of fitting to the observed patterns. The third model includes career age up to its fifth power, which appears to very modestly improve upon the former model. The fourth model includes career age up to its seventh power, and the patterns captured by this model are largely the same as the 5th-power model up to career age 50, beyond which there are some notable differences in the inferred attrition probabilities. These differences would have relatively little impact on model outcomes because the vast majority of faculty have a career age below 50.

      Second, we show the observed probability that hires are women, conditional on the career age of the hire. Once again, we fit four models to the data, and find that career age should be included at least up to its fifth order in order to capture the correlation structures between career age and the gender of new hires. However, limited differences result from including career age up to the 7th degree in the model (relative to the 5th degree).

      As a final sensitivity analysis, we reproduce Fig. 2, but rather than including career age as an exogenous variable up to its fifth power in our models for hiring and attrition, we include career age up to its third power. Findings under this parameterization are qualitatively very similar to those presented in Fig. 2, indicating that the results are robust to modest changes to model parameterization (shown in supplement Fig. S6).

      Far more detail in this and some interim results from each stage of the analysis would make the paper far more convincing. It currently has an air of "black box" too much of the analysis which would easily allow an unconvinced reader to discard the results.

      We have added more detailed descriptions of the methods to the main text. We hope that the changes made will address these concerns.

      Section C: You use the Leslie model to predict the future population. As the model is linear the population will either grow exponentially (most likely) or dwindle to zero. You mention you dealt with this by scaling the average value of H to keep the population at 2020 levels? This would change the ratio of hiring to attrition. How did this affect the timescale of the results. If a field had very minimal attrition (and hence grew massively over the time period of the dataset) the hiring rate would have to be very small too so there would be very little change in the gender balance. Did you consider running the model to steady state instead?

      We chose the 40 year window (2020-2060) for this projection analysis because 40 years is roughly the timespan of a full-length faculty career. In other words, it will take around 40 years for most of the pre-existing faculty from 2020 to retire, such that the new, simulated faculty will have almost entirely replaced all former faculty by 2060.

      For three out of five of our projection scenarios (OA, GNA, OA+ER), the point at which observed faculty are replaced by simulated faculty represents steady state. One way to check this intuition is to observe the asymptotic behavior of the trajectories in Fig. 3B; the slopes for these 3 scenarios nearly level out within 40 years.

      The other two scenarios (OA + IR, GNA+IR) represent situations where women’s representation among new hires is increasing each year. These scenarios will not reach steady state until women represent 100% of faculty. Accordingly, the steady state outcomes for these scenarios would yield uninteresting results; instead, we argue that it is the relative timescales that are interesting.

      What did you do to check that your predictions at least felt realistic under the fitted parameters? (see above for presenting the goodness of fit over the 10 years of the data).

      We ran the analysis suggested in a prior comment (Starting the model in 2011 how well does it fit the available data up to 2020?) and found that model outcomes were statistically indistinguishable from the observed 2020 faculty gender compositions for all 111 academic fields, plus the “All STEM” and “All non-STEM” aggregations.

      You only present the final proportion of women for each scenario. As mentioned earlier, models of this type have a tendency to lead to strange population distributions with wild age predictions and huge (or zero populations). Presenting more results here would assuage any worries the reader had about these problems. What is the predicted age distribution of men and women in the long term scenarios? Would a different method of keeping the total population in check have yielded different results? Interim results, especially from a model as complex as this one, rather than just presenting a final single number answer are a convincing validation that your model is a good one! Again, presenting this result will go a long way to convincing readers that your results are sound and rigorous.

      Thank you for this suggestion. We now include a figure that presents faculty age distributions for each projection scenario at 2060 against the observed faculty age distribution in 2020 (pictured below, and as Fig. S3 in the supplementary materials). We find that the projected age distributions are very similar to the observed distributions for natural sciences (shown) and for the additional academic domains. We hope this additional validation will inspire confidence in our model of faculty hiring and attrition for the reviewer, and for future readers.

      In Fig S3, line widths for the simulated scenarios span the central 95% of simulations.

      Other people have reached almost identical conclusions (albeit it with smaller data sets) that hiring is more important than attrition. It would be good to compare your conclusions with their work in the Discussion.

      We have revised the main text to cite the listed examples of similar studies. We thank the reviewer for bringing these relevant works to our attention.

      General comments:

      What thoughts have you given to non-binary individuals?

      Be careful how you use the term "gender diversity"! In many countries "Gender diverse" is a term used in data collection for non-binary individuals, i.e. Male, female, gender diverse. The phrase "hiring more gender diverse faculty" can be read in different ways! If you are only considering men and women then gender balance may be a better framework to use.

      We have added language to the main text which explicitly acknowledges that our analysis focuses on men and women due to limitations in our name-based gender tool, which only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      We have also taken additional care with referring to “gender diversity,” per reviewer 1’s point in their public review.

      Reviewer #2 (Recommendations For The Authors):

      Data availability: I did not see an indication that the dataset used here is publicly available, either in its raw format or as a summary dataset. Perhaps this is due to the sensitive nature of the data, but regardless of the underlying reason, the authors should include a note on data availability in the paper.

      The dataset used for these analyses were obtained under a data use agreement with the Academic Analytics Research Center (AARC). While these data are not publicly available, researchers may apply for data access here: https://aarcresearch.com/access-our-data.

      We also added a table to the supplemental materials (Tab. S3) that reports the estimated number of men and women in each of the 111 fields.

      Additionally, a variety of summary statistics based on this dataset are available online, here: https://github.com/LarremoreLab/us-faculty-hiring-networks/tree/main

      Gender classification: Was an existing package used to classify gender from names in the dataset, or did the authors develop custom code to do so? Either way, this code should be cited. I would also be curious to know what the error rate of these classifications are, and suggest that additional information on potential biases that might result from automated classifications be included in the discussion, under the section describing data limitations. The reliability of name-based gender classification is particularly of interest, as external gender classifications such as those applied on the basis of an individual's name - may not reflect the gender with which an individual self-identifies. In other words, while for many people their names may reflect their true genders, for others those names may only reflect their gender assigned at birth and not their self-perceived or lived gender identity. Nonbinary faculty are in particular invisibilized here (and through any analysis that assigns binary gender on the basis of name). While these considerations do not detract from the main focus of the study - which was to utilize an existing dataset classified only on the basis of binary gender to assess trends for women faculty-these limitations should be addressed as they provide additional context for the interpretation of the results and suggest avenues for future research.

      We use a free, open-source, and open-data python package called nomquamgender (Van Buskirk et al, 2023) to estimate the strengths of (culturally constructed) name-gender associations. For sufficiently strong associations with a binary gender, we apply those labels to the names in our data. We have updated the main text to make this approach more apparent.

      We have also added language to the main text which explicitly acknowledges that our approach only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      As we mentioned in response to the public review, we use a free and open source python package called nomquamgender to estimate the strengths of name-gender associations, and we apply gender labels to the names with sufficiently strong associations with a binary gender. This package is based on a paper by Van Buskirk et. al. 2023, “An open-source cultural consensus approach to name-based gender classification,” which documents error rates and potential biases.

      We have also added language to the main text which explicitly acknowledges that our approach only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      Page 1: The sentence beginning "A trend towards greater women's representation could be caused..." is missing a conjunction. It should likely read: "A trend towards greater women's representation could be caused entirely by attrition, e.g., if relatively more men than women leave a field, OR entirely by hiring..."

      We have edited the paragraph to remove the sentence in question.

      Pages 1-2: The sentence beginning "Although both types of strategy..." and ending with "may ultimately achieve gender parity" is a bit of a run-on; perhaps it would be best to split this into multiple sentences for ease of reading.

      We have revised this run-on sentence.

      Page 2: See comments in the public review about a methods section, the addition of which may help to improve clarity for the readers. Within the existing descriptions of what I consider to be methods (i.e., the first three paragraphs currently under "results"), some minor corrections could be added here. First, consider citing the source of the dataset in the line where it is first described (in the sentence "For these analyses, we exploit a census-level dataset of employment and education records for tenured and tenure-track faculty in 12,112 PhD-granting departments in the United States from 2011-2020.") It also may be helpful to include context here (or above, in the discussion about institutional analyses) about how "departments" can be interpreted. For example, how many institutions are represented across these departments? More information on how the authors eliminated the gendered aspect of patterns in their counterfactual model would be helpful as well; this is currently hinted at on page 4, but could instead be included in the methods section with a call-out to the relevant supplemental information section (S2B).

      We have added a citation to Academic Analytics Research Center’s (AARC) list of available data elements to the data’s introduction sentence. We hope this will allow readers to familiarize themselves with the data used in our analysis.

      Faculty department membership was determined by AARC based on online faculty rosters. 392 institutions are represented across the 12,112 departments present in our dataset. We have updated the main text to include this information.

      Finally, we have added a methods section to the main text, which includes information on how the gendered aspect of attrition patterns were eliminated in the counterfactual model.

      Page 2: Perhaps some indication of how many transitions from an out-of-sample institution might be helpful to readers hoping to understand "edge cases."

      In our analysis, we consider all transitions from out-of-sample institutions to in-sample institutions as hires, and all transitions away from in-sample institutions–whether it be to an out of sample institution, or out of academia entirely–as attritions. We choose to restrict our analysis of hiring and attrition to PhD granting institutions in the U.S. in this way because our data do not support an analysis of other, out-of-sample institutions.

      I also would have liked additional information on how many faculty switched institutions but remained "in-sample and in the same field" - and the gender breakdowns of these institutional changes, as this might be an interesting future direction for studies of gender parity. (For example, readers may be spurred to ask: if the majority of those who move institutions are women, what are the implications for tenure and promotion for these individuals?)

      While these mid-career moves are not counted as attritions in the present analysis, a study of faculty who switch institutions but remain (in-sample) as faculty could shed light on issues of gendered faculty retention at the level of institutions. We share the reviewer’s interest in a more in depth study of mid-career moves and how these moves impact faculty careers, and we now discuss the potential value of such a study towards the end of the paper. In fact, this subject is the topic of a current investigation by the authors!

      Page 3: I was confused by the statement that "of the three types of stable points, only the first point represents an equitable steady-state, in which men and women faculty have equal average career lengths and are hired in unchanging proportions." Here, for example, computer science appears to be close to the origin on Figure 1, suggesting that hiring has occurred in "unchanging proportions" over the study interval. However, upon analysis of Table S2, it appears that changes in hiring in Computer Science (+2.26 pp) are relatively large over the study interval compared to other fields. Perhaps I am reading too literally into the phrase that "men and women faculty are hired in unchanging proportions" - but I (and likely others) would benefit from additional clarity here.

      We had created an arrow along with the computer science label in Fig. 1, but it was difficult to see, which is likely the source of this confusion. This was our fault, and we have moved the “Comp. Sci.” label and its corresponding arrow to be more visible in Figure 1.

      Changes in women’s representation in Computer Science due to hiring over 2011 - 2020 was +2.26 pp as the reviewer points out, but, consulting Fig. 1 and the corresponding table in the supplement, we observe that this is a relatively small amount of change compared to most fields.

      Page 3: If possible it may be helpful to cite a study (or multiple) that shows that "changes in women's representation across academic fields have been mostly positive." What does "positive" mean here, particularly when the changes the authors observe are modest? Perhaps by "positive" you mean "perceived as positive"?

      We used the term positive in the mathematical sense, to mean greater than zero. We have reworded the sentence to read “women's representation across academic fields has been mostly increasing…” We hope this change clarifies our meaning to future readers.

      Page 3: The sentence that ends with "even though men are more likely to be at or near retirement age than women faculty due to historical demographic trends" may benefit from a citation (of either Figure S3 or another source).

      We now cite the corresponding figure in this sentence.

      Page 4: The two sentences that begin with "The empirical probability that a person leaves their academic career" would benefit from an added citation.

      We have added a citation to the sentences.

      Figure 3: Which 10 academic domains are represented in Panel 3B? The colors in appear to correspond to the legend in Panel 3A, but no indication of which fields are represented is provided. If possible, please do so - it would be interesting and informative to be able to make these comparisons.

      This was not clear in the initial version of Fig. 3B, so we now label each domain. For reference, the domains represented in 3B are (from top to bottom):

      ● Health

      ● Education

      ● Journalism, Media, Communication

      ● Humanities

      ● Social Sciences

      ● Public Administration and Policy

      ● Medicine

      ● Business

      ● Natural Sciences

      ● Mathematics and Computing

      ● Engineering

      Page 6: Consider citing relevant figure(s) earlier up in paragraph 2 of the discussion. For example, the first sentence could refer to Figure 1 (rather than waiting until the bottom of the paragraph to cite it).

      Thank you for this suggestion, we now cite Fig. 1 earlier in this discussion paragraph.

      Page 10: A minor comment on the fraction of women faculty in any given year-the authors assume that the proportion of women in a field can be calculated from knowing the number of women in a field and the number of men. This is, again, true if assuming binary genders but not true if additional gender diversity is included. It is likely that the number of nonbinary faculty is quite low, and as such would not cause a large change in the overall proportions calculated here, but additional context within the first paragraph of S1 might be helpful for readers.

      We have added additional context in the first paragraph of S1, explaining that an additional term could be added to the equation to account for nonbinary faculty representation if our data included nonbinary gender annotations. Thank you for making this point.

      Page 10: Please include a range of values for the residual terms of the decomposition of hiring and attrition in the sentence that reads "In Figure S1 we show that the residual terms are small, and thus the decomposition is a good approximation of the total change in women's representation."

      These residual terms range from -0.51pp to 1.14pp (median = 0.2pp). We have added this information to the sentence in question.

      Page 12: It may be helpful to readers to include a description of the information contained in Table S2 in the supplemental text under section S3.

      We refer to table S2 twice in the main text (once in the observational findings, and once for the counterfactual analysis), and the contents of table S2 are described thoroughly in the table caption.

      Reviewer #3 (Recommendations For The Authors):

      (1) There is a potential limitation in the generalizability of the findings, as the study focuses exclusively on US academia. Including international perspectives could have provided a more global understanding of the issues at hand.

      The U.S. focus of this study limits the generalizability of our findings, as non-U.S. other faculty may exhibit differences in hiring patterns, retention patterns, and current demographic representations. We have added a discussion of this limitation to the manuscript. Unfortunately, our data do not support international analyses of hiring and attrition.

      (2) I am not sure that everyone who disappeared from the AARC dataset could be count as "attrition" from academia. Indeed, some who disappeared might have completely left academia once they disappeared from the AARC dataset. Yet, there's also the possibility that some professors left for academic positions in countries outside of the US, or US institutions that are not included in the AARC dataset. These individuals didn't leave academia. Furthermore, it is also possible that these scholars who moved to an institution outside of US or not indexed by AARC are gender specific. Therefore, analyses that this study conducts should find a way to test whether the assumption that anyone who disappeared from AARC is indeed valid. If not, how will this potentially challenge the current conclusions?

      The reviewer makes an important point: faculty who move to faculty positions in other countries and faculty who move to non-PhD granting institutions, or to institutions that are otherwise not included in the AARC data are all counted as attritions in our analysis. We intentionally define hiring and attrition broadly to include all cases in which faculty join or leave a field or domain within our dataset.

      The types of transitions that faculty make out of the tenure track system at PhD granting institutions in the U.S. may correlate with faculty attributes, like gender. For example, women or men may be more likely to transition to tenure track positions at non-U.S. institutions. Nevertheless, these types of career transition represent an attrition for the system of study, and a hire for another system. Following this same logic, faculty who transition from one field to another field in our analysis are treated as an attrition from the first field and a hire into the new field.

      By focusing on “all-cause” attrition in this way, we are able to make robust insights for the specific systems we consider (e.g.,, STEM and non-STEM faculty at U.S. PhD granting institutions), without being roadblocked by the task of annotating faculty departures and arbitrating which should constitute “valid” attritions.

      (3) It would be very interesting to know how much of the attribution was due to tenure failure. Previous studies have suggested that women are less likely to be granted tenure, which makes me wonder about the role that tenure plays in the gendered patterns of attrition in academia.

      We note that faculty attrition rates start low and then reach a peak around 5-7 years after earning PhD, and then decline until around 15-20 years post-PhD, after which, attrition rates increase as faculty approach retirement. The first local maximum appears to coincide roughly with the tenure clock timing, but we can only speculate that these attritions are tenure related. Our dataset is unfortunately not equipped to determine the causal mechanisms driving attrition.

      We reproduce the attrition risk curve in the supplementary materials, Fig. S4:

      (4) The dataset used doesn't fully capture the complexities of academic environments, particularly smaller or less research-intensive institutions (regional universities, historically black colleges and universities, and minority-serving institutions). This could be potentially added to the manuscript for discussions.

      We have added this point to the description of this study’s limitations in the discussion.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      By identifying a loss of function mutant of IQCH in infertile patient, Ruan et al. shows that IQCH is essential for spermiogenesis by generating a knockout mouse model of IQCH. Similar to infertile patient with mutant of IQCH, Iqch knockout mice are characterized by a cracked flagellar axoneme and abnormal mitochondrial structure. Mechanistically, IQCH regulates the expression of RNA-binding proteins (especially HNRPAB), which are indispensable for spermatogenesis.

      Although this manuscript contains a potentially interesting piece of work that delineates a mechanism of IQCH that associates with spermatogenesis, this reviewer feels that a number of issues require clarification and re-evaluation for a better understanding of the role of IQCH in spermatogenesis.

      Line 251 - 253, "To elucidate the molecular mechanism by which IQCH regulates male fertility, we performed liquid chromatography tandem mass spectrometry (LC‒MS/MS) analysis using mouse sperm lysates and detected 288 interactors of IQCH (Figure 5-source data 1)."

      The reviewer had already raised significant concerns regarding the text above, noting that "LC‒MS/MS analysis using mouse sperm lysates" would not identify interactors of IQCH. However, this issue was not addressed in the revised manuscript. In the Methods section detailing LC-MS/MS, the authors stated that it was conducted on "eluates obtained from IP". However, there was no explanation provided on how IP for LC-MS/MS was performed. Additionally, it was unclear whether LC-MS or LC-MS/MS was utilized. The primary concern is that if LC‒MS/MS was conducted for the IP of IQCH, IQCH itself should have been detected in the results; however, as indicated by Figure 5-source data 1, IQCH was not listed.

      Thanks to reviewer’s comments. Additional details regarding the IP protocol for LC-MS/MS analysis have been included in the methods section in the revised manuscript. Furthermore, we apologize for the previous inconsistencies in the terminology used for LC-MS/MS and have now ensured its consistent usage throughout the document. Regarding the primary concern about the absence of IQCH in Figure 5-source data 1, our study only showed identifying proteins that interact with IQCH, not IQCH itself. Additionally, we conducted co-IP experiments to validate the interactions identified by LC-MS/MS analysis. Actually, we identified the IQCH itself by LC-MS/MS analysis (Author response table 1).

      Author response table 1.

      Results of the LC-MS/MS analysis.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors should know what experiments have been done for the studies.

      We apologize for our oversights. The method for RNA-binding protein immunoprecipitation (RIP) has been detailed in the revised manuscript.

      Typos still remain in the text, e.g., line 253, "Fiugre".

      We are sorry for the spelling errors. We have engaged professional editing services to refine our manuscript.

    1. Author response:

      We thank the reviewers for their thoughtful consideration of our study and are delighted they found the findings to be important. In this initial response to the overall positive reviews, we want to address common themes raised, clarify points relevant to a few specific reviewer concerns, and frame plans for the revised manuscript.

      (1) Analysis of data from human tissue: Reviewer 1 notes “In their analyses of enteric glia from existing single-cell transcriptomic data sets, it is stated that these come from 'non-diseased' humans. However, the data on the small intestine is obtained from children with functional gastrointestinal disorders (Zheng 2023). Data on colonic enteric glia was obtained from colorectal cancer patients (Lee 2020). Although here the cells were isolated from non-malignant regions, saying that the large intestines of these patients are non-diseased is probably an overstatement.

      In the Zheng et al. dataset, “functional GI disorders” refers to biopsies from children that do not have any histopathologic evidence of digestive disease. The children do, however, have at least one GI symptom that prompted a diagnostic endoscopy with biopsies, leading to the designation of “functional” disorder. Given that diagnostic endoscopies are invasive procedures that necessitate anesthesia, obtaining biopsies from completely healthy, asymptomatic children without any clinical indication would not be allowable per most institutional review boards, leading the authors of that study to use these samples as a control group. We thus used the “non-diseased” label to encompass these samples as well as those from the unaffected regions of large intestine from colorectal cancer patients. We recognize, however, that this label might be misleading and will revise the manuscript to more accurately reflect the information on control tissue origin.

      Another existing dataset including human mucosal enteric glia of healthy subjects is presented in Smillie et al (2019). It would be interesting to see how the current findings relate to the data from Smillie et al.” 

      We thank the reviewer for directing us to the Smillie et al. 2019 dataset. This dataset derives from colonic mucosal biopsies from 12 healthy adults (8480 stromal cells) and 18 adults with ulcerative colitis (10,245 stromal cells from inflamed bowel segments and 13,146 from uninflamed), all between the ages of 20-77 years. Our preliminary analysis shows that the putative glial cluster in this dataset does not separate by inflammation or disease state based on the common glial genes: S100B, PLP1, and SOX10. PLP1 and S100B are broadly expressed across this cluster while GFAP is not detected in this dataset, consistent with our observations from the two other human datasets included in our manuscript. In the revised manuscript, we will include the Smillie et al. 2019 data in a supplemental figure as additional supportive evidence.

      (2) Validation and further details of the Plp1CreER-DTA model for genetic depletion of enteric glia: Reviewer 1 notes “The time between enteric glia depletion and analyses (mouse sacrifice) must be a crucial determinant of the type of effects, and the timing thereof. In the current study 11 days after tamoxifen treatment was chosen as the time point for analyses, which is consistent with earlier work by the lab using the same model (Rao et al 2017). What would happen when they wait longer than 11 days after tamoxifen treatment?”  Reviewer 3 asks whether “the Plp1CreER Rosa26DTA/+ mice system established correctly” and raises concern about quantitative characterization.

      In previous work, we discovered that the gene Plp1 is broadly expressed by enteric glia and, within the mouse intestine, is quite specific to glial cells (PMID: 26119414). We characterized the Plp1CreER mouse line as a genetic tool in detail in this initial study. Then in a subsequent study, we used Plp1CreER-DTA mice to genetically deplete enteric glia and study the consequences on epithelial barrier integrity, crypt cell proliferation, enteric neuronal health and gastrointestinal motility (PMID: 28711628). In this second study, we performed extensive validation of the Plp1CreER-DTA mouse model including detailed quantification of glial depletion in the small and large intestines across the myenteric, intramuscular and mucosa compartments by immunohistochemical (IHC) staining of whole tissue segments to sample thousands of cells. We found that the majority of S100B+ enteric glia were depleted within 5 days in both sexes, including more than 88% loss of mucosal glia, and that this loss was stable at 3 subsequent timepoints (7, 9 and 14 days post-tamoxifen induction of Cre activity). Glial loss was further confirmed by IHC for GFAP in the myenteric plexus, and by ultrastructural analysis of the small intestine to ensure cell depletion rather than simply loss of marker expression. Our group was the first to use this model to study enteric glia, and since then similar models and our key observations have been replicated by other groups (PMID: 33282743, 34550727). Thus, we consider this model to be well established.

      Reviewer 1 raises an excellent question about examining epithelial health beyond 11 days post-tamoxifen (11dpt) in this model. Particularly given the longer-lived nature of Paneth cells relative to other epithelial cell types, this would be very interesting to explore. Through 11dpt, Cre+ mice are well-appearing and indistinguishable from their Cre-negative control littermates. Unfortunately, a limitation of the Plp1CreER-DTA model is that beyond 11dpt, Cre+ mice become anorexic, lose body weight, and have signs of neurologic debility such as hindlimb weakness and uncoordinated gait that are prominent by 14dpt. These phenotypes are likely the consequence of targeting Plp1+ glia outside the gut, such as Schwann cells and oligodendrocytes (as described in another study which used a similar model to study demyelination in the central nervous system, PMID: 20851998). Given these CNS effects and that starvation is well known to affect Paneth cell phenotypes (PMIDs: 1167179, 21986443), we elected not to examine timepoints beyond 11dpt. Technological advances that enable more selective cell depletion would allow study of more chronic effects of enteric glial loss.

      (3) Sex differences in the microbiome data: All 3 reviewers queried whether there were sex differences in the microbiome data with Reviewer 1 explaining “Previously the authors showed that enteric glia regulation of intestinal motility is sex-dependent (Rao et al 2017). While enteric glia depletion caused dysmotility in female mice, it did not affect motility in males. For this reason, most experiments in the current study were conducted in male mice only. However, for the experiments focusing on the effect of enteric glia depletion on host-microbiome interactions and intestinal microbiota composition both male and female mice were used. In Figure 8A male and female mice are distinctly depicted but this was not done for Figure 8C. Separate characterization of the microbiome of male and female mice would have helped to figure out how much intestinal dysmotility (in females) contributes to the effect on gut microbial composition. This is an important exercise to confirm that the effect on the microbiome is indeed a consequence of altered Paneth cell function…”

      In our microbiome analysis, we initially analyzed males and females separately but did not observe significant differences between the two sexes. Thus, we merged the data to increase the statistical power of the genotype comparisons. It was an oversight on our part to not label the female and male datapoints in Figure 8C as we did for the other data in the manuscript. We will update this graph and related supplemental figures in the revised version. Per Reviewer 2’s suggestion, we will also address this further in the Results and Discussion.

      (4) Reconciling RNA-Seq identification of transcriptional changes in the colon, but not the small intestine, while the GSEA and downstream tissue level morphological and functional analyses detected phenotypes in the small intestine. Reviewers 1 and 3 raised this question with Reviewer 1 noting “…enteric glia depletion was found to affect Paneth cells structurally and functionally in the small intestine, where transcriptional changes were initially not identified. Only when performing GSEA with the in silico help of cell type-specific gene profiles, differences in Paneth cell transcriptional programs in the small intestine were uncovered. A comment on this discrepancy would be helpful, especially for the non-bioinformatician readers among us.” 

      Standard differential gene expression analysis (DEG) of the effects of glial loss revealed significant differences only in the colon, and even there only a handful of genes were changed. These changes were not accompanied by corresponding changes at the protein level, at least as detectable by IHC. In the small intestine, there were no significant differences by standard DEG thresholds. Unlike DEG, gene set enrichment analyses (GSEA), provides a significance value based on whether there is a higher than chance number of genes that are changing in a uniform direction without consideration for the significance of the magnitude of change. Therefore, the GSEA detected that a significant number of genes in the curated Paneth cell gene list exhibited a positive fold change difference in the bulk RNA sequencing data. This prompted us to examine Paneth cells and other epithelial cell types in more detail by IHC, functional and ultrastructural analyses, which all converged on the observation that Paneth cells were relatively selectively disrupted in the epithelium of glial depleted mice.

      (5) Other: We will address all remaining comments in our detailed author response that will accompany our revised manuscript. We thank Reviewer 2 for the very positive feedback overall and highlighting opportunities to better label findings in some of the figures. We will make these suggested changes in our revised manuscript.

    1. Author response:

      We thank the reviewers for their highly valuable comments and recommendations on our manuscript. We particularly appreciate receiving reviews from three distinct points of view, all highly relevant to our study (i.e. from an ecological, biomechanics, and evolutionary biology perspective).

      We will now carefully address all reviewer comments and questions, and resubmit a revised version in due time. Again, we thank the reviewers for their rigorous assessment of our study, which will greatly help us improving our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank you and the two Reviewers for the thoughtful evaluation of the manuscript and the support for publication. We have addressed all points raised by the two Reviewers.

      - We have extensively streamlined the manuscript. Repetitive passages regarding the respective kinase cascades have been removed.

      - We improved the presentation of the main Figures (mainly labeling and font size):

      - Figure 1: C, D, E, F o Figure 2: C, E, F, G, I, o Figure 3: D o Figure 4: F

      - Figure 5: A, B, C, D, E

      - We integrated new SI-data related to kinase functions, expression and the ‘cell-type comparisons’ of the KinCon reporter system (Figure Supplement 4, 5).

      Below you will find a detailed point-by-point response.

      Reviewer #1 (Recommendations For The Authors):

      Regarding the issue of the use of the word "dynamics," as described in the public review, here are a few examples of ambiguous use in different sentences: o Line 27: dynamics of full-length protein kinases. Is this referring to the dynamics of conformational interconversion between inactive and active states?

      - Line 138: dynamic functioning of kinases. It is not clear what this means. o Line 276: ... alters KinCon dynamics. Not clear if they are measuring time-dependent process or a single point. 

      - Figure legend 4F: dynamics of CDK4/6 reporters. Again, not clear how the assay is measuring dynamics.

      In my opinion, the authors use proper terminology that describes their assay in which the term dynamics is not used: Title: "... impact of protein and small molecule interactions on kinase conformations" and Line 89 "... reporter can be used to track conformational changes of kinases...".

      We have replaced the “dynamics” sections. 

      - Line 27: The understanding of the structural dynamics of…

      - Line 91: This reporter can be used to track dynamic changes of kinases conformations…

      - Line 139: Conventional methods often fall short in capturing the dynamics of kinases within their native cellular environments…

      - Line 146: Such insights into the molecular structure dynamics of kinases in intact cells…

      - Line 199: In order to enhance our understanding of kinase structure dynamics…

      - Line 276: These findings underline that indeed the trimeric complex formation alters….

      - Figure Legend 4F: Quantification of alterations of CDK4/6 KinCon reporter bioluminescence signals…

      The authors state that KinCon has predictive capabilities (abstract and line 142). What do  the authors mean by this?

      Previously we have benchmarked the suitability of the KinCon reporter for target engagement assays of wt and mutated kinase activities. With this we determined specificities of melanoma drugs for mutated BRAF variants (Mayrhofer 2020, PNAS). 

      The authors indicate that KinCon is a highly sensitive assay. Can the authors elaborate on what high sensitivity means?  

      With sensitivity we mean that we can detect conformation dynamics of the reporter at low expression levels of the hybrid protein expressed in the cell line of choice.

      - Line 209: Immunoblotting of cell lysates following luminescence measurements showed expression levels of the reporters in the range and below the endogenous expressed kinases (Figure 1E).  …

      - Line 219:   Using this readout, we showed that at expression levels of the BRAF KinCon reporter below the immunoblotting detection limit, one hour of drug exposure exclusively converted BRAF-V600E to the more closed conformation (Figure 1F, G, Figure Supplement 1B). 

      - Line 221: These data underline that at expression levels far below the endogenous kinase, protein activity conformations can be tracked in intact cells. …

      For example, can they discuss how other fluorescence-based approaches that are less sensitive would not be able to accomplish the same type of results or derive similar conclusions? Can they provide a resolution metric both in space and time? Given that the authors state that this is a technical report, this information is of relevance.

      We highlight the key pros & cons of the KinCon reporter technology in following sections:

      -Line 529: The KinCon technology, introduced here, seeks to address the previously mentioned challenges. It has the potential to become a valuable asset for tracking kinase functions in living cells which are hard to measure solely via phosphotransferase activities. Overall, it offers an innovative solution for understanding kinase activity conformations, which could pave the way for more novel intervention strategies for kinase entities with limited pharmaceutical targeting potential. So far, this relates to the tracking of kinase-scaffold and pseudo-kinase functions.

      - Line 535: Key advantages of the KinCon reporter technology is the robustness of the system to track kinase conformations at varying expression levels. However, in contrast to fluorescence-based reporter read-outs subcellular analysis and cell sorting are still challenging due to comparable low levels of light emission

      The authors nicely describe how KinCon works in Figure 1B and part of 1C. I do think that the bottom of panel 1C needs to be revised, as well as the text describing the potential scenarios of potency, efficacy, and synergism.

      One issue with this part of Figure 1C is that it is not clear what the x-axis in the 3 plots refers to. Is this time? Is this concentration of a small molecule, inhibitor, or binding partner? This was confusing also in the context of the term dynamics used throughout the text. The terms potency, efficacy, and synergism should be subtitles, or the panels and the x-axis should be better defined, especially for a non-specialized reader.

      Related to this part of Figure 1C is the text. The authors mention potency, effectiveness, and synergy (Line 195). Can the authors use more fundamental terminology related to these three scenarios, for example, changes in activation constant, and percent of protein activates? Also, why synergy is only related to effectiveness? Can synergy also be associated with potency?

      Thank you for bringing this up, we have revised Figure 1C to better reflect the mentioned effects of potency. To avoid confusion, we removed the illustration for drug synergism. Accordingly, we have integrated the axis descriptions for the presented dose-response curves.   

      Thus, we have further streamlined the text in the introduction – examples are shown below:

      - Line 195: Light recordings and subsequent calculations of time-dependent dosage variations of bioluminescence signatures of parallel implemented KinCon configurations aid in establishing dose-response curves. These curves are used for discerning pharmacological characteristics such as drug potency, effectiveness of drug candidates, and potential drug synergies (Figure 1C)

      - Figure 1C:  Shown is the workflow for the KinCon reporter construct engineering and analyses using KinCon technology. The kinase gene of interest is inserted into the multiple cloning site of a mammalian expression vector which is flanked by respective PCA fragments (-F[1], -F[2]) and separated with interjacent flexible linkers. Expression of the genetically encoded reporter in indicated multi-well formats allows to vary expression levels and define a coherent drug treatment plan. Moreover, it is possible to alter the kinase sequence (mutations) or to co-express or knock-down the respective endogenous kinase, interlinked kinases or proteinogenic regulators of the respective pathway. After systematic administration of pathway modulating drugs or drug candidates, analyses of KinCon structure dynamics may reveal alterations in potency, efficacy, and potential synergistic effects of the tested bioactive small molecules (schematic dose response curves are depicted)

      Lastly, the use of these three cartoons gives the impression that the experimental results to come will follow a similar representation. Instead, the results are presented in bar plots for many different conditions. I think this will lead to confusion for a broad audience.

      The bottom panel of Figure 1C is not the depiction of real experiments but rather an illustration of fitted dose-response curves. We would like to present previous demonstrations of doseresponse curves using BRAF KinCon data and ERK phosphorylation (Röck 2019, Sci. Advances) 

      We further agree with the reviewer and have therefore added a new part in the methods section addressing the evaluation of data extensively. 

      - Line 668: In Figure 1 E and F, a representative experiment of n=4 independent experiments is shown. In these cases, absolute bioluminescence values without any normalization are shown. Otherwise, data was indicated as RLU (relative light unit) fold change. This means the data was normalized on the indicated control condition (either with normalization of the western blot or without; as indicated.

      For a non-expert reader, can the authors clarify the use of tracking basal conformations vs. transient over-expression of the various KinCon constructs? Moreover, the authors use the term transient over-expression for 10, 16, 24, and 48 h (Line 203). This, to a non-expert reader, does not seem transient.

      We have revised the manuscript to clarify it:

      - Line 207: We showed that transient over-expression of these KinCon reporters for a time frame of 10h, 16h, 24h or 48h in HEK293T cells delivers consistently increasing signals for all KinCon reporters (Figure 1E, Figure Supplement 1A). 

      - Figure 1E) Representative KinCon experiments of time-dependent expressions of indicated KinCon reporter constructs in HEK293T cells are shown (mean ±SEM). Indicated KinCon reporters were transiently over-expressed in 24-well format in HEK293T cells for 10h, 16h, 24h and 48h each.

      Regarding Figure 1E and similar graphical representations: Why is the signal (RLU) nonlinear with time? If the fluorescence of the KinCon construct is linearly related to its expression or concentration inside the cell, one would expect a linear increase. Have the authors plotted RLU/Expression band intensity to account for changes in protein concentration? For instance, some of the results within Figure 3 are normalized to concentration on reporter expression level.

      Out intention was to show that varying expression levels can be used for the illustrated target engagement assays.Indeed, the represented elevations of RLU might be  due to factors such as: 

      - Doubling times of cells

      - Cell density

      - Media composition (which changes over time)

      - Reporter protein stabilities

      - Abundance of interactors of kinases

      For the results with LKB1, the authors claim that intermediate fold change in fluorescence (Figure 2E) is due to a partially closed intermediate state (Line 262). Can the authors discard the possibility by which there is a change in populations of active and inactive that on average give intermediate values?

      Based on our experience with KinCon reporter conformation states of kinases we tested so far, we assume that the presented data reflects an intermediate state. We agree that it needs further validation. We have changed the text accordingly:

      - Line 264: Upon interaction with LKB1 this conformation shifts to a partially closed intermediate state.

      The authors claim in Line 274 that mutations located at the interface of the LKB1/STRADalpha complex affect interactions and hypothesize that allosteric communication between LKB1 and STRADalpha is essential for function. Given that these mutations are at the interaction interface, why would the authors postulate an allosteric mechanism that evokes an effect distant from the interaction/active site? Could it be that function requires surface contacts alone that are disrupted by the mutations?

      We agree with the reviewer and changed our argumentation for this point:

      - Line 276: These findings underline that indeed the trimeric complex formation alters the opening and closing of the tested full-length kinase structures using the applied KinCon reporter read out

      I was unable to find text to explain the following: Figure 2I shows the mutation R74A as n.s., but in the text, only W308C is mentioned to not change fluorescence. Could the authors clarify why R74A is not discussed in the text?  Maybe this reviewer missed the text in which it was discussed.

      We adapted the manuscript and include the R74A mutation as followed:

      - Line 296: Among these mutations, only the W308C and R74A mutation prevented significant closing of the LKB1 conformation when co-expressed with STRAD𝛼 and MO25 (Figure 2I).

      In Figure 2I where the individual measurements of the LKB1-R74A KinCon are highlighted in red to better emphasize the deviations. In the case of the R74A mutation the effect seen might be due to the high deviation between the experiments (Highlighted in red). These deviations are much higher when compared to either the wt or the W308 mutant, and can also be seen in the LKB1-R74A-KinCon only condition (white). Even though no significant closing of the LKB1 conformation could be observed in the case of R74A, we believe, since the trend of the conformation closing upon complex formation is still visible that the effect is still there. Further replicates would be necessary to validate this theory. 

      Similarly, the authors state in line 326 that the study included an analysis of RIPK2. However, I was unable to find results, graphs, or additional text discussing RIPK2.

      The RIPK2 conformation was analyzed in Figure 3C (page 12).

      Some figures of RLU use absolute values, percentages, and fold change. Is there are reason why the authors use different Y-axis values? These should be explained and justified in Methods. Similarly, bars for wt in Figures 3D, G, or 4D, E, F show no errors. How are the authors normalizing the data and repeats so that there is no error, and are they treating the rest of the data (i.e., mutants and/or treated with small molecules) in the same way?

      We have changed the Y-axis values. Now, throughout the manuscript we show that there is a RLU fold-change. Except are selected experiments when solely absolute RLU values are shown (such as Figure 1E, F). We have also decided to integrate a paragraph into the methods section (Line 655). Figure 3D was changed as well.

      - Line 668: In Figure 1 E and F, a representative experiment of n=4 independent experiments is shown.  In these cases absolute bioluminescence values without any normalisation are shown.  Otherwise, data was indicated as RLU fold change. This means the data was normalized on the indicated control condition (either with normalization of the western blot or without; as indicated).

      The data is generally normalized on wt or untreated conditions, when the cells were treated with small molecules for target engagement assays. 

      Lastly, the section starting in Line 472 reads more like a discussion of results from different types of inhibitors used in this study that results on its own. The authors should consider a new subtitle such as results or make this section a discussion.

      We agree with the reviewer and this part of the results was split into a new section of the result:

      - Line 455: “Effect of different kinase inhibitor types on the KinCon reporter system”.

      Reviewer #2 (Recommendations For The Authors):

      I have a few suggestions, since the paper is a distillation of a vast amount of work and tells a useful story.

      (1) The work is very solid, uses examples from the literature, and also extends into new experimental space. An obvious weakness is mentioned by the authors for the CKD data, in that measurements with Cyclin D (the activating subunit) are not characterized, although Cyclin D might be assumed to be present. 

      We performed experiments with the CDK4/6 KinCon reporters and co-expressed CyclinD with a ratio of 1:3 (HEK293T cells, expression for 48h). However, in the context of inhibitor treatments we could not track conformation changes in these initial experiments. The cells were treated with the indicated CDK4/6i [1µM] for 3h. This seems to not impact the conformation of CDK4/6 wt or mutated KinCon reporters. There is a tendency that CyclinD co-expression promotes CDK4/6 conformation opening (data not shown).

      Author response image 1.

      Bioluminescence signal of CDK4/6 KinCon reporters with co-expressed CyclinD3 (HEK293T, expression for 48h) upon exposure to indicated CDK4/6i [1µM] or DMSO for 3h (mean ±SEM, n=3 ind. experiments). No significant changes using the current setting.

      (2) The work with the trimeric LKB1 complex involves pseudokinase, STRADalpha, whose conformation is also examined as a function of LKB1 status; since STRAD is an activator of LKB1. A future goal should be the evaluation of the complex in the presence of STRAD inhibitory/activating small molecules.

      Thank you for this great idea, we are currently compiling a FWF grant application to get support for such a R&D project.

      Minor points

      • Have any of the data been repeated in a different cell background? This came to mind because HeLa cells lack LKB1, which might be a useful place to test the LKB1 data in a different context.

      This experiment was performed and we show it in Figure Supplement 5. Further, we followed the advice of the reviewer and performed suggested experiments. We integrated the colon cancer cell line SW480 into the experimental setup. Overall, three cell settings showed the same pattern of KinCon reporter analyses for LKB1-STRADα-MO25 complex formation utilizing the LKB1- and STRADα-KinCon reporters.  

      • The study picks up the PKA Cushings Syndrome field, which makes sense, and data are presented for L206R. PMID 35830806 explains how different patient mutations drive different signaling outcomes through distinct complex formations, and it would be interesting to discuss how mutations in KinCon complexes, especially those with mutations, could affect sub-cellular localization. Could the authors explain if this was done for any of the proteins, whose low experimental expression is a clear advantage, but is presumably hard to maintain across experiments?

      The feedback of the reviewer motivated us to perform subcellular fractionation experiments. They were performed with PKAc wt and L206R KinCon reporters as well as BRAF wt and V600E reporters. We were not able to see major differences between the wt and mutated reporter constructs in respect to their nucleus: cytoplasm localizations (Figure Supplement 4). For your information, in a R+D project with the mitochondrial kinase PINK1 we see localization of the reporter as expected almost exclusively at the mitochondria fraction. 

      - Line 495: In this context of activating kinase mutations we showed that using PKAc (wt and L206R) and BRAF (wt and V600E) reporters as example we could not track alterations of cytoplasmic and nuclear localization (Figure Supplement 4). Furthermore, subcellular localization of PKAc KinCon reporters did not change when L206R mutant was introduced (Figure Supplement 4). As a control BRAF wt and V600E KinCon reporters were used and also no changes in localization was observed.

      • I suggest changing PMs (Figure 2 and others) simply to mutation, I read this as plasma membrane constantly.

      We agree and we have changed it to “patient mutation” in Figure 2C, Figure 3E, Figure 4B.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The findings in this study are useful and may have practical implications for predicting DLBCL risk subject to further validating the bioinformatics outcomes. We found the approach and data analysis solid. However, some concerns regarding the drug sensitivity prediction and the links between the selected genes for the risk scores have been raised that need to be addressed by further functional works.

      Thanks for your high recognition for our study. In fact, we have searched the treatment information of DLBCL patients in our own cohort, however, unfortunately all patients were treated strictly according to the guidelines issued by authorities of China, which suit Chinese patients fine but do not include the drugs explored in the present study. Therefore, more further investigations should be designed and conducted to validate our conclusion. Here, we provided a possible direction for future studies base on large cohorts, which could not only provide more reliable conclusions, but gain more attentions to the role of tumor microenvironment in influencing outcome and drug sensitivity.

      Public Reviews:

      Sincere thanks for all reviewers’ positive comments on our study and their helpful recommendations for improving our manuscript. For this part, we have sorted out the comments and recommendations from all reviewers, and made corresponding revisions. And here are our responses.

      (1) How did we determined the three genes (VCAN, C1QB and CD3G) in the prognostic model?

      Just as was mentioned in the “Prognostic model” in Materials and Methods section, the gene was selected by “survival” package in R. After we obtained the nine genes, we input the expression value of them, and analyzed with “survival” package in R. And the function “step” in that package can optimize the model, that is, to construct a model with as less factors as possible, and the finally enrolled factors were representative and presented the least collinearity. Through this way, the prognostic model we got could be more practical in clinical practice.

      (2) Different centers have different protocols of IHC, so how could we put this model into clinical practice under this circumstance?

      Not only did different centers have different protocols, the materials like antibodies also vary. Therefore, there is actually a long way to go in putting our study into clinical practice. As far as we’re concerned, there are at least three problems to solve. First, diagnostic antibodies should be used in clinical practice, which usually manifest better specificity and sensitivity. And this may be the reason why the staining of VCAN and C1QB was strong and difficult to differentiate. Second, a standardized protocol should be made. Last but not least, more precise analyses and studies should be conducted to make it clear which type of cells specifically express these genes (just as was mentioned by Reviewer #2). We are now endeavoring to solve these problems by utilizing as many techniques as possible, like multi-omics and mIHC. From revealing the true expression pattern to developing high quality antibodies and even standardized test kit, we are looking forward to a clinical translation.

      (3) The analyses about immune infiltration and the key genes in DLBCL were superficial, limited within the correlation analyses.

      Due to the model constructed based on tumor purity of DLBCL, the risk score could be associated with the enrichment of cell functions. We conducted GSEA analysis based on the differentially expressed genes between high-risk group and low-risk group in the two datasets (Figure 5H-I). It showed that the extracellular organization and cellular adhesion were different between the two groups, in which way the immune infiltration and activity might be regulated owing to the motility of immune cells. Besides, we have validated the infiltration of M1 macrophages and M2 macrophages with our own cohort (Supplementary Figure 3P).

      (4) The drug sensitivity was just analyzed based on the model, which should be validated in real world research or lab study. And the sensitivity score seemed not different too much in most cases, even though there were statistical significance.

      We tried to search the treatment information of DLBCL patients in our own cohort, however, unfortunately all patients were treated strictly according to the guidelines issued by authorities of China, which suit Chinese patients fine but do not include the drugs explored in the present study. Therefore, more further investigations should be designed and conducted to validate our conclusion. Here, we provided a possible direction for future studies base on large cohorts, which could not only provide more reliable conclusions, but gain more attentions to the role of tumor microenvironment in influencing outcome and drug sensitivity. As for the differences between high- and low-risk group, as a matter of fact, sometimes a little dose of drug could have a huge effect, because the dose-effect curve is usually nonlinear. Therefore, reduce the dose, even just 1%, the adverse effects could be avoided. To sum up, the drug sensitivity analyses in our study could provide more possibility for clinical trial and practice, and we are taking it into consideration to design reasonable clinical research.

      (5) C1QB was associated with decreased tumor purity and worse prognosis, but decreased tumor purity was related to better prognosis. How to elucidate the contradiction?

      Just as discussed in Discussion section, previous studies have revealed the role of C1QB in promoting an immunosuppressive microenvironment in cancer (see reference 22-26). C1QB might recruit the infiltration of pro-tumor immune cells, resulting in a reducing tumor purity on its perspective. However, the immune microenvironment was regulated by multi factors which form a network and combat or synergize each other. The statistical analysis often gives a possible phenomenon, but could not provide mechanism explanation. Therefore, more mechanic studies are needed to reveal the connection and key node. This is exactly what we will explore next.

      (6) Others:

      (1) Line 51 has been rewritten.

      (2) References for ESTIMATE algorithm (reference 16) and CD3G+ T cells has been added (reference 17).

      (3) The illegible figure labels might be caused by the incompatibility between the PDF file we submitted and the submission system. We have provided the TIFF images in this revision, and the EPS file could be submitted to editors upon their requests.

      (4) A supplement description has been added to the Figure legend of Figure 6 to make it clear.

      (5) In order to explore the expression of key genes among different locations of DLBCL we performed analyses in Figure5 and supplementary Figure3. These results might be thought-provoking that the tumor microenvironment differs among DLBCLs even though they share similar histological characteristics.

    1. Author response:

      We thank the editors and reviewers for their thorough engagement with the manuscript and their well-informed comments on the Poseidon framework. We are pleased to note that they consider Poseidon a promising and timely attempt to resolve important issues in the archaeogenetics community. We also agree with the main challenges they raise, specifically the lack of long-term, independent infrastructure funding at the time of writing, and various aspects of Poseidon that bear the potential to further consolidate a de-facto alienation of the aDNA community from the wider field of genomics.

      Poseidon is indeed dependent on the Department of Archaeogenetics at MPI-EVA. For the short to middle-term future (3-5 years) we consider this dependency beneficial, providing a reliable anchor point and direct integration with one of the most proficient data-producing institutions in archaeogenetics. For the long term, as stated in the discussion section of the manuscript, we hope for a snowball effect in the dissemination and adoption of Poseidon to establish it as a valuable community resource that automatically attracts working time and infrastructure donations. To kickstart this process we have already intensified our active community outreach and teach Poseidon explicitly to (early career) practitioners in the field. We are aware of options to apply for independent infrastructure funding, for example through the German National Research Data Infrastructure (NFDI) initiative, and we plan to explore them further.

      As the reviewers have noted, key decisions in Poseidon’s data storage mechanism have been influenced by the special path archaeogenetics has taken compared to other areas of genomics. The founding goal of the framework was to integrate immediately with established workflows in the field. Nevertheless we appreciate the concrete suggestions on how to connect Poseidon better with the good practices that emerged elsewhere. We will explicitly address the European Variation Archive in a revised version of the manuscript, deliberate embedding the BioSamples ID of the INSDC databases more prominently in the .janno file, prioritise support for VCF next to EIGENSTRAT and PLINK and add an option to clearly document the relevant human reference genome on a per-sample level. In the revised version of the text we will also explain the treatment of non-overlapping SNPs between studies by trident’s forge algorithm and how we imagine the interplay of different call sets in the Poseidon framework in general.

      Beyond these bigger concerns we will also consider and answer the various more detailed recommendations thankfully shared by the reviewers, not least the question how we imagine Poseidon to be used by archaeologists and for archaeological data.

    1. Author response:

      We wish to express our sincere acknowledgement to the reviewers and the editors for the time and the effort spent in reviewing our manuscript. We highly appreciate the positive feedback and the thorough and constructive comments.

      We plan to conduct additional experiments to address the reviewers’ concerns.

      (1) We plan to utilize the RIPK1 kinase dead mice to investigate the role of RIPK1 kinase activity in these metabolic stress responses.

      (2) We plan to conduct flow cytometry analysis to detect the percentage or number of different cell types in fasted liver tissue, to provide more accurate and quantitative assessments of monocyte   recruitment.

      (3) We plan to conduct more western blotting to detect the expression of related molecules in the signal transduction pathway, to further clarify the underlying mechanisms.

      (4) Regarding the single-cell RNA sequencing analysis,we plan to conduct CellChat analysis to provide information about the interactions between different cell populations.

      (5) We will fix the issues regarding the data graphs and image resolutions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      This study is very well framed and the writing is very clear. The manuscript is well organized and easy to follow and overall the previous state of the art of the field is taken into account.  I only have a couple of minor comments 

      (1) There is a preprint that uses single nuclei RNA-Seq and ST on human MS subcortical white matter lesions doi: https://doi.org/10.1101/2022.11.03.514906. This work needs to be included in the discussion of the results. 

      (1.1) We appreciate the reviewer bringing up this important preprint, and we have referenced it in the Discussion section of our updated manuscript. 

      (2) The discussion should include the overall limitations of the study and how much it can be translated to human MS. Specifically, the current work uses EAE and therefore different disease stages are not captured in this study. This point is also raised by other reviewers. 

      (1.2) We thank the reviewer for raising this important point, and we have included additional discussion about the limitations of EAE and its disease relevance to MS.

      Reviewer #2 (Recommendations For The Authors):

      The authors state that this EAE model is better for studying cortical gradients because previous models "such as directly injecting inflammatory cytokines into the meninges/cortex" cause a traumatic injury. It needs to be discussed that these models have now been superseded by more refined models involving long-term overexpression of pro-inflammatory cytokines in the sub-arachnoid space, thereby avoiding traumatic injury. The current results should be discussed in light of these newer models (James et al, 2020; 2022), which are more similar to MS cortical pathology and do exhibit lymphoid-like structures. 

      (2.1) We thank the reviewer for pointing out these relevant studies, and we agree they describe non-traumatic and more MS-relevant models of leptomeningeal inflammation. We have included discussion of these works in the updated manuscript.  

      • The study will be substantially improved if some of the ST data is validated at least partially with some RNAscope or other in situ hybridization using a subset of probes that capture the take-home message of the paper. 

      (2.2) We agree with the reviewer that validation of transcriptomics results is important to support our conclusions. In the updated manuscript Figure 5 and Supplemental Figure 6 we have added RNAscope results for relevant genes. In agreement with the trends noted in the manuscript, expression of genes related to antigen processing and presentation such as B2m decreases gradually with distance from LMI. We also have included a reference to a newly published manuscript from our group (Gupta et al., 2023, J. Neuroinflammation) that characterizes meningeal inflammation and sub-pial changes in the SJL EAE model. In that manuscript, IHC is used to show accumulation of B cells and T cells in the leptomeningeal space, increased microglial and astrocyte reactivity adjacent to leptomeningeal inflammation, and reduction of neuronal markers adjacent to leptomeningeal inflammation.  

      • The lack of change in signaling pathways involved in B-cell/T-cell interaction and cytokine/chemokine signaling, which would be expected in areas of immune cell aggregation in the meninges, needs discussion. 

      (2.3) While we detected significant upregulation in antigen presentation, complement activation, and humoral immune signaling, areas of meningeal inflammation identified as cluster 11 showed upregulation of numerous other GO gene sets associated with immune cell interaction and cytokine signaling, as described in supplementary table 3. These include T-cell receptor binding, CCR chemokine receptor binding, interleukin 8 production, response to interleukin 1, positive regulation of interleukin-6 production, tumor necrosis factor production, leukocyte cell-cell adhesion. Overall, we believe that the collection of enriched gene sets is consistent with peripheral myeloid and lymphoid infiltration and cytokine production, with the most prominent cytokine / pathways being interferon ɣ/antigen processing and presentation, complement, and humoral inflammation.

      • Fig 4 subclusters includes T-cell activation, pos regulation of neuronal death, cellular response to IFNg, neg regulation of neuronal projections, Ig mediated immune response, cell killing, pos regulation of programmed cell death, pos regulation of apoptotic process, but none of these are discussed despite their obvious importance. 

      (2.4) We agree with the reviewer that these upregulated genesets warrant additional discussion and have added additional reference to these genesets in the results section. Also, the genesets ‘positive regulation of programmed cell death’, ‘positive regulation of apoptotic process’, and ‘positive regulation of cell death’ were erroneously included in Figure 4F in the initial manuscript, as they are actually downregulated in cluster 1_4. This has been clarified in the text.

      • Subcluster 11 appears spatially to represent the meninges, but what pathways are expressed there? 330 genes/pathways altered independent of other clusters - immune cell regulation? 

      (2.5) We refer the reviewer to Supplementary Table 3, which contains a complete list of GO genesets enriched within cluster 11 spots.

      • The surprising lack of immunoglobulin genes upregulated in the meninges of the mice, considering these are the genes most upregulated in the MS meninges. Should be pointed out and discussed. 

      (2.6) We appreciate the reviewer bringing up immunoglobulin genes, which previous publications have shown are elevated in MS meninges and cortical grey matter lesions. Consistent with this, several immunoglobulin genes are elevated in cluster 11, including genes encoding IgG2b, IgA, and IgM. While these results were available within the original submission in Supplementary Table 2, we have included the graph in the updated Supplementary Figure 3.

      • Meningeal signature may be poorly represented given the individual slices shown in suppl 3A, which suggests that only 3 of the EAE slices had significant meningeal infiltrates, indicated by cluster 11 genes.  

      (2.7) There was heterogeneity in the location and extent of meningeal infiltrate / cluster 11 in the EAE slices, as the reviewer points out. 2 slices had severe inflammation, 2 had moderate inflammation, and 2 had relatively mild inflammation, but all EAE slices were enriched in inflammation relative to naïve as demonstrated not only through clustering, but also through enriched marker analysis between EAE and Naive and Progeny analysis.  

      • The ST is not resolving the meningeal tissue and the immediate underlying grey matter, as demonstrated by a high signal for both CXCL13 and GFAP in cluster 11. 

      (2.8) We agree that the spatial transcriptomics strategy applied here is inadequate to precisely delineate between meningeal inflammation and the underlying brain parenchyma, and that the elevation of markers such as GFAP in cluster 11 indicates some ‘contamination’ of parenchymal cells into cluster 11. We have clarified this in the text and discussed the limitation of the spatial transcriptomics method used.  

      • More information is required concerning how many animals were used in this study, to meet the requirements for complying with the 3Rs. 

      (2.9) A total of 4 mice were used per group. In the naïve group one mouse contributed two slices, for a total of 5 naïve slices. In the EAE group two mice contributed two slices, for a total of 6 EAE slices. We have clarified this in the methods section of the updated manuscript.

      Reviewer #3 (Recommendations For The Authors):

      The authors should provide a more thorough description of the methodology, and there are a few minor concerns about experimental details, data presentation, and description that need to be addressed. In the next few lines, I will highlight a few important aspects that need to be addressed, propose some changes to the main manuscript, and suggest some additional experiments that, if successful, could confirm/support/further strengthen the conclusions that are at this point purely based on transcriptomic data. 

      Major comments/suggestions: 

      • The main gene expression changes between the control and EAE groups obtained via spatial transcriptomics need to be validated with another technique, at least partially. I suggest performing RNAscope or immunofluorescence imaging using brain sections from a new and independent cohort of animals, where cell-specific markers can also be tested. This type of assessment would work as a validation method and could also inform about the cell-specific contribution to the observed transcriptomic changes. 

      (3.1) Please refer to response 2.2 

      • The representative qualitative spatial expression heatmaps for each gene in Fig. 1F should be accompanied by corresponding graphs with quantitative measurements. Similar to what is done regarding the data in Fig. 2B and D. 

      (3.2) We agree with the reviewer that quantitative graphs were missing, and we have included them in the updated Supplementary Figure 1. 

      • A supplementary table discriminating all the DEGs (132 up and 70 downregulated) between cluster 11 and the other clusters has to be provided. What is the contribution of recruited encephalitogenic adaptive immune cells to this cluster 11 gene signature? 

      (3.3) These unfiltered results are provided in Supplementary Table 2, and to view the up and down regulated genes the reader can sort the table based on fold change and adjusted P value. We believe providing the complete table is more useful to the reader, since the fold change and

      P value thresholds used to determine “significance” are arbitrary. Since the spatial transcriptomics method used in this work does not have single cell resolution, we cannot accurately estimate the contribution of encephalitogenic adaptive immune cells in cluster 11. However, given previously published work of lymphocyte infiltration into the subarachnoid space in SJL EAE (Gupta et al., 2023, J. Neuroinflammation) and the enrichment of Cd3e in cluster 11 (Log2FC 0.31, adjusted P-val 0.005) we assume some contribution of peripheral lymphocytes.

      • The authors mention that there is grey matter pathology in this relapse model, and this has been shown in a previous publication (Bhargava et al., 2021). However, the regions analyzed in the present study are different from the ones shown in the referenced paper. Is there an overexpression of genes involved in, or gene modules indicative of, neuronal stress and/or death that spatially overlap with clusters 1 and 2? If so, it would be important to provide information about those gene modules in the main figures. It would also be quite relevant to show the levels of cell stress/death proteins and of axonal stress/damage, by APP and/or nonphosphorylated SMI-32 staining, in the deep brain regions (like the thalamus), to corroborate the link between these phenomena and the gene signatures of subclusters 1_3, 1_4, and 2_6. 

      (3.4) We thank the review for this insightful comment. We have recently published a manuscript that histologically analyzes leptomeningeal inflammation in the SJL EAE model, specifically assessing the areas looked at in our submitted manuscript (Gupta et al., 2023, J. Neuroinflammation). In that manuscript, IHC is used to show accumulation of B cells and T cells in the leptomeningeal space, increased microglial and astrocyte reactivity adjacent to leptomeningeal inflammation, and reduction of neuronal markers adjacent to leptomeningeal inflammation. To further describe the gene modules in the inflammatory subclusters 1_3/1_4/2_6, we have now provided heatmaps of the selected genesets and their constituent genes (Supplementary Figure 5). 

      • It would be important to provide heatmaps discriminating the DEGs that make the gene modules that are significantly altered in subclusters 1_3, 1_4, and 2_6. The gene ontology terms are sometimes ambiguous. For instance, it would be very informative to the reader (and to the field) to know which altered genes compose the "lysosome", "immune response", "response to stress", or "B cell meditated immunity" pathways that are altered in the EAE subcluster 1_3 (Fig. 4E). The same applies to the gene modules altered in the other subclusters of interest. Authors should also consider generating a Venn diagram with the DEGs from subclusters 1_3, 1_4, and 2_6, to complement the GO term Venn presented in Fig. 4H. Having these pieces of information readily available, either as main or supplementary figures, would be a great addition. 

      (3.5) We agree with the reviewer on this point and have included these heatmaps in Supplementary Figure 5. 

      • The role of IFN-gamma as well as B cells (and Igs) in myelination/remyelination is mentioned in the discussion. However, there is very little evidence that these cells or their cytokines/Igs are mediating the described transcriptomic signatures at the level of the brain parenchyma of EAE mice undergoing relapse. Do the "antigen processing and presentation, cell killing, interleukin 6 production, and interferon gamma response" go terms, which better fitted the trajectory analysis, in fact include genes expressed almost exclusively by T and/or B cells? Are there genes that are downstream of IFN type I or II signaling? 

      (3.6) Pathways including antigen processing / presentation, humoral inflammation, complement, among others were enriched in areas of meningeal inflammation and adjacent areas of parenchyma. These signaling pathways are mediated by effector molecules, many of which are produced by lymphocytes, but that can act on cells within the CNS parenchyma. The heatmaps in Supplementary Figure 5 demonstrate the significant role of MHC and complement genes, which could be expressed by leukocytes as well as glia, on many of the pathways.

      • Is the transcriptomic overlap between meningeal and brain parenchymal regions, or the appearance of signatures similar to the parenchymal subclusters 1_3, 1_4, and 2_6, prevented if the mice are treated with the murine versions of natalizumab or rituximab prior relapse? 

      (3.6) We appreciate the reviewers suggestion. Our future directions for this work includes testing the effects of disease modifying therapies on spatial and single-cell transcriptomic readouts of disease in SJL EAE.

      • Please clarify what control group was used in this study. Naïve mice are mentioned in the Results section, does this mean that control animals were not injected with CFA? Authors should also elaborate on the descriptive methodology employed for the analysis of the spatial

      transcriptomics data - especially regarding the trajectory analysis. As is, overall, the methodology description might not favor reproducibility. 

      (3.7) We appreciate the need for clarification here. Our control group in this study was naïve, not having received any CFA or pertussis toxin. While often used as the control in EAE studies focused on mechanisms of autoimmunity, CFA and pertussis toxin independently induce systemic inflammation. Since in this study we were interested in neuroinflammation broadly, we chose to use a naïve comparison group to maximize our ability to find genes enriched in neuroinflammation. We have elaborated our methods section, including methods related to trajectory analysis. 

      Minor comments/suggestions: 

      In Fig. 1D the indication of the rostral to ventral axis needs to be inverted. 

      Addressed.

      In Fig. 1E the authors should also include a representative H&E staining of the same region in a control animal. 

      Addressed.

      There is inconsistency in the number of clusters obtained after UMAP unbiased clustering of the spatial transcriptomic data: 

      • Fig. 3A-E - twelve clusters are shown (cluster 0 to 11). 

      • In the Results section eleven clusters are mentioned - "we performed unbiased UMAP clustering on the spatial transcriptomic dataset and identified 11 distinct clusters".

      The text was incorrect, there were 12 distinct clusters. This has been corrected.

      Considering the mice strain used was SJL/J mice, the peptide used to induce EAE should be PLP139-151, as mentioned in the Methods section "Induction of SJL EAE". However, the legend of Fig. 1 mentions "post immunization with MOG 35-55". Please correct this. 

      Corrected.

      In the Methods section it is mentioned "At 12 weeks post-immunization, animals were euthanized", however the Results section mentions that tissues were harvested at 11 weeks post-immunization - "Brain slices were collected from four naïve mice and four EAE mice 11 weeks postimmunization". Please correct this. 

      The Methods were incorrect, this has now been fixed. 

      Please clarify the number of animals used for spatial transcriptomic analysis: 

      • Legend of Fig. 1 mentions "Red arrows indicate MRI time points, black arrow indicates time of tissue harvesting (N = 6)." Whilst in the Results section it states "Brain slices were collected from four naïve mice and four EAE mice". 

      The figure one legend has now been corrected (N = 4). Additionally, we have added clarification about the number of animals / slices used in the Methods section (see response 2.9).

      Please be consistent in the way of representing DEGs in the MA plots: 

      • Fig. 3F shows the upregulated genes (in red) on the right and the downregulated genes (in blue) on the left. 

      • Supplemental Fig. 2K shows the upregulated genes (in red) on the left and the downregulated genes (in blue) on the right. 

      • Supplemental Fig. 4 shows the upregulated genes on the right in blue, while the downregulated genes are in red. 

      This has been fixed.

      The letters attributed to each subcluster in panels E-G of Fig. 4 are different from the respective figure legend. 

      This has been fixed.

      Correct the legend of supplemental figure 2: o "(G-H) Representative spatial feature plots of read count (F) and UMI (G) demonstrate expected anatomic variability in transcript amount and diversity.". 

      This has been fixed.

      In Supplemental Fig. 4G there is probably an error with the XX axis, since the significantly up and down-regulated genes are not visible. 

      This has been fixed.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript by Su et al., the authors present a massively parallel reporter assay (MPRA) measuring the stability of in vitro transcribed mRNAs carrying wild-type or mutant 5' or 3' UTRs transfected into two different human cell lines. The goal presented at the beginning of the manuscript was to screen for effects of disease-associated point mutations on the stability of the reporter RNAs carrying partial human 5' or 3' UTRs. However, the majority of the manuscript is dedicated to identifying sequence components underlying the differential stability of reporter constructs. This shows that TA dinucleotides are the most predictive feature of RNA stability in both cell lines and both UTRs.

      The effect of AU rich elements (AREs) on RNA stability is well established in multiple systems, and the present study confirms this general trend but points out variability in the consequence of seemingly similar motifs on RNA stability. For example, the authors report that a long stretch of Us has extreme opposite effects on RNA stability depending on whether it is preceded by an A (strongly destabilizing) or followed by an A (strongly stabilizing). While the authors interpretation of a context- dependence of the effect is certainly well-founded, it seems counterintuitive that the preceding or following A would be the (only) determining factor. This points to a generally reductionist approach taken by the authors in the analysis of the data and in their attempt to dissect the contribution of "AU rich sequences" to RNA stability, with a general tendency to reduce the size and complexity of the features (e.g. to dinucleotides). While this certainly increases the statistical power of the analysis due to the number of occurrences of these motifs, it limits the interpretability of the results. How do TA dinucleotides per se contribute to destabilizing the RNA, both in 5' and 3' UTRs, but (according to limited data presented) not in coding sequences? What is the mechanism? RBPs binding to TA dinucleotide containing sequences are suggested to "mask" the destabilizing effect, thereby leading to a more stable RNA. Gain of TA dinucleotides is reported to have a destabilizing effect, but again no hypothesis is provided as to the underlying molecular mechanism. In addition to reducing the motif length to dinucleotides, the notion of "context dependence" is used in a very narrow sense; especially when focusing on simple and short motifs, a more extensive analysis of the interdependence of these features (beyond the existing analysis of the relationship between TA- diNTs and GC content) could potentially reveal more of the context dependence underlying the seemingly opposite behavior of very similar motifs.

      The contribution of coding region sequence to RNA stability has been extensively discussed (For example: doi.org/10.1016/j.molcel.2022.03.032; doi.org/10.1186/s13059-020-02251-5; doi.org/10.15252/embr.201948220; doi.org/10.1371/journal.pone.0228730; doi.org/10.7554/eLife.45396). While TA content at the third codon position (wobble position) has been implicated as a pro-degradation signal, codon optimality has emerged as the most prominent determinant for RNA stability. This indicates that the role of coding regions in RNA stability differs from that of UTRs due to the involvement of translation elongation. We did not intend to suggest that TA-dinucleotides in UTRs and coding regions have the same effect.

      We hypothesize that TA-dinucleotide may recruit endonucleases RNase A family, whose catalytic pockets exhibit a strong bias for TA dinucleotide (doi.org/10.1016/j.febslet.2010.04.018). Structures or protein bindings that blocks this recognition might stabilize RNAs. To gain further insight into the motif interactions, we plan to investigate the interactions between TA and other 15 dinucleotides through more detailed analyses.

      The present MPRAs measures the effect of UTR sequences in one specific reporter context and using one experimental approach (following the decay of in vitro transcribed and transfected RNAs). While this approach certainly has its merits compared to other approaches, it also comes with some caveats: RNA is delivered naked, without bound RBPs and no nuclear history, e.g. of splicing (no EJCs), editing and modifications. One way to assess the generalizability of the results as well as the context dependence of the effects is to perform the same analysis on existing datasets of RNA stability measurements obtained through other methods (e.g. transcription inhibition). Are TA dinucleotides universally the most predictive feature of RNA half-lives?

      Our system studies the stability control of RNA synthesized in vitro and delivered into human cells. While we did not intend to generalize our conclusions to endogenous RNAs, our approach contributes to the understanding of in vitro synthesized RNA used for cellular expression, such as in vaccines. It is known that endogenous RNAs undergo very different regulation. The most prominent factors controlling endogenous RNA stability are the density of splice junctions and the length of UTRs (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x). To decipher the sequence regulation, these factors are controlled in our experiments. Therefore we do not expect the dinucleotide features found by our approach to be generalized as the most predictive feature of RNA half-life in vivo.

      The authors conclude their study with a meta-analysis of genes with increased TA dinucleotides in 5' and 3'UTRs, showing that specific functional groups are overrepresented among these genes. In addition, they provide evidence for an effect of disease-associated UTR mutations on endogenous RNA stability. While these elements link back to the original motivation of the study (screening for effects of point mutations in 5' and 3' UTRs), they provide only a limited amount of additional insights.

      We utilized the Taiwan Biobank to investigate whether mutations significantly affecting RNA stability also impact human biochemical measurements. Our findings indicate that these mutations indeed have a significant effect on various biochemical indices. This highlights the importance of our study, as it bridges basic science with potential applications in precision medicine. By linking specific UTR mutations with measurable changes in biochemical indices, our research underscores the potential for these findings to inform targeted medical interventions in the future.

      In summary, this manuscript presents an interesting addition to the long-standing attempts at dissecting the sequence basis of RNA stability in human cells. The analysis is in general very comprehensive and sound; however, at times the goal of the authors to find novelty and specificity in the data overshadows some analyses. One example is the case where the authors try to show that TA-dinucleotides and GC content are decoupled and not merely two sides of the same coin. They claim that the effect of TA dinucleotides is different between high- and low-GC content contexts but do not control for the fact that low GC-content regions naturally will contain more TA dinucleotides and therefore the effect sizes and the resulting correlation between TA-diNT rate and stability will be stronger (Fig. 5A). A more thorough analysis and greater caution in some of the claims could further improve the credibility of the conclusions.

      Low GC content implies a higher TA content but does not directly equate to a high TA-diNT rate. For instance, the sequence ATTGAACCTT has a lower GC content (0.3) compared to TATAGGCCGC (0.6), yet it also has a lower TA-diNT rate (0 vs. 0.22). To address this concern more rigorously, we performed a stratified analysis based on TA-diNT rate. As shown in our Fig. S7C, even after stratifying by TA-diNT rate (upper panel high TA-diNT rate / lower panel low TA-diNT rate), we still observe that the destabilizing effect of TA is stronger in the low GC content group.

      Reviewer #2 (Public Review):

      Summary of goals:

      Untranslated regions are key cis-regulatory elements that control mRNA stability, translation, and translocation. Through interactions with small RNAs and RNA binding proteins, UTRs form complex transcriptional circuitry that allows cells to fine-tune gene expression. Functional annotation of UTR variants has been very limited, and improvements could offer insights into disease relevant regulatory mechanisms. The goals were to advance our understanding of the determinants of UTR regulatory elements and characterize the effects of a set of "disease-relevant" UTR variants.

      Strengths:

      The use of a massively parallel reporter assay allowed for analysis of a substantial set (6,555 pairs) of 5' and 3' UTR fragments compiled from known disease associated variants. Two cell types were used.

      The findings confirm previous work about the importance of AREs, which helps show validity and adds some detailed comparisons of specific AU-rich motif effects in these two cell types.

      Using a Lasso regression, TA-dinucleotide content is identified as a strong regulator of RNA stability in a context dependent manner based on GC content and presence of RNA binding protein binding motifs. The findings have potential importance, drawing attention to a UTR feature that is not well characterized.

      The use of complementary datasets, including from half-life analyses of RNAs and from random sequence library MRPA's, is a useful addition and supports several important findings. The finding the TA dinucleotides have explanatory power separate from (and in some cases interacting with) GC content is valuable.

      The functional enrichment analysis suggests some new ideas about how UTRs may contribute to regulation of certain classes of genes.

      Weaknesses:

      It is difficult to understand how the calculations for half-life were performed. The sequencing approach measures the relative frequency of each sequence at each time point (less stable sequences become relatively less frequent after time 0, whereas more stable sequences become relatively more frequent after time 0). Since there is no discussion of whether the abundance of the transfected RNA population is referenced to some external standard (e.g., housekeeping RNAs), it is not clear how absolute (rather than relative) half-lives were determined.

      We estimated decay constant λ and half-life () by the following equations:

      where Ci(t) and Ci(t=0) are read count values of the ith replicate at time points  and  (see also Methods). The absolute abundance was not required for the half-life calculation.

      Fig. S1A and B are used to assess reproducibility. They show that read counts at a given time point correlate well across replicate experiments. However, this is not a good way to assess reproducibility or accuracy of the measurements of t1/2 are. (The major source of variability in read counts in these plots - especially at early time points - is likely the starting abundance of each RNA sequence, not stability.) This creates concerns about how well the method is measuring t1/2. Also creating concern is the observation that many RNAs are associated with half-lives that are much longer than the time points analyzed in the study. For example, based upon Figure S1 and Table S1 correctly, the median t1/2 for the 5' UTR library in HEK cells appears to be >700 minutes. Given that RNA was collected at 30, 75, and 120 minutes, accurate measurements of RNAs with such long half lives would seem to be very difficult.

      We estimated the half-life based on the following equations:

      Where Ci(t) and Ci(t=0) are read count values of the ith replicate at time points  and  (see also Methods). The calculation of the half-life involves first determining the decay constant 𝜆, which represents a constant rate of decay. Since 𝜆 is a constant, it is possible to accurately calculate it without needing data over the entire decay range. Our experimental design considers this by selecting appropriate time points to ensure a reliable estimation of 𝜆, and thus, the half-life. To determine the most suitable time points, we conducted preliminary experiments using RT-PCR. These experiments indicated that 30, 75, and 120 minutes provided an effective range for capturing the decay dynamics of the transcripts.

      There is no direct comparison of t1/2 between the two cell types studied for the full set of sequences studied. This would be helpful in understanding whether the regulatory effects of UTRs are generally similar across cell lines (as has been shown in some previous studies) or whether there are fundamental differences. The distribution of t1/2's is clearly quite different in the two cell lines, but it is important to know if this reflects generally slow RNA turnover in HEK cells or whether there are a large number of sequence-specific effects on stability between cell lines. A related issue is that it is not clear whether the relatively small number of significant variant effects detected in HEK cells versus SH-SY5Y cells is attributable to real biological differences between cell types or to technical issues (many fewer read counts and much longer half lives in HEK cells).

      For both cell lines, we selected oligonucleotides with R2 > 0.5 and mean squared error (MSE) < 1 for analysis when estimating half-life (λ) by linear regression. This selection criterion was implemented to minimize the effect of experimental noise. Additionally, we will further analyze the MSE distribution to determine if the two cell lines exhibit significantly different levels of experimental noise. We will also provide a direct comparison of half-lives between the two cell lines to assess the similarity in stability regulation.

      The general assertion is made in many places that TA dinucleotides are the most prominent destabilizing element in UTRs (e.g., in the title, the abstract, Fig. 4 legend, and on p. 12). This appears to be true for only one of the two cell lines tested based on Fig. 3.

      TA-dinucleotides and other TA-rich sequences exhibit similar effects on RNA stability, as illustrated in Fig. S5A-C. In two cell lines, TA-dinucleotide and WWWWWW sequences were representatives of the same stability-affecting cluster. While the impact of TA-dinucleotides can be generalized, we will rephrase some statements for clarification to avoid any potential misunderstanding.

      Appraisal and impact:

      The work adds to existing studies that previously identified sequence features, including AREs and other RNA binding protein motifs, that regulate stability and puts a new emphasis on the role of "TA" (better "UA") dinucleotides. It is not clear how potential problems with the RNA stability measurements discussed above might influence the overall conclusions, which may limit the impact unless these can be addressed.

      It is difficult to understand whether the importance of TA dinucleotides is best explained by their occurrence in a related set of longer RBP binding motifs (see Fig 5J, these motifs may be encompassed by the "WWWWWW cluster") or whether some other explanation applies. Further discussion of this would be helpful. Does the LASSO method tend to collapse a more diverse set of longer motifs that are each relatively rare compared to the dinucleotide? It remains unclear whether TA dinucleotides are associated with less stability independent of the presence of the known larger WWWWWWW motif. As noted above, the importance of TA dinucleotides in the HEK experiments appears to be less than is implied in the text.

      To ensure the representativeness of the features entered into the LASSO model, we pre-selected those with an occurrence greater than 10% among all UTRs. There is no evidence to support a preference for dinucleotides by LASSO. To address whether the destabilizing effect of TA dinucleotides is part of the broader WWWWWW motif, we will divide TA dinucleotides into two groups: those within the WWWWWW motif and those outside of it. We will then examine whether TA dinucleotides in these two groups exhibit the same destabilizing effect.

      The inclusion of more than a single cell type is an acknowledgement of the importance of evaluating cell type-specific effects. The work suggests a number of cell type-specific differences, but due to technical issues (especially with the HEK data, as outlined above) and the use of only two cell lines, it is difficult to understand cell type effects from the work.

      The inclusion of both 3' and 5' UTR sequences distinguishes this work from most prior studies in the field. Contrasting the effects of these regions on stability is of interest, although the role of these UTRs (especially the 5' UTR) in translational regulation is not assessed here.

      We examined the role of UTR and UTR variants in translation regulation using polysome profiling. By both univariate analysis and an elastic regression model, we identified motifs of short repeated sequences, including SRSF2 binding sites, as mutation hotspots that lead to aberrant translation. Furthermore, these polysome-shifting mutations had a considerable impact on RNA secondary structures, particularly in upstream AUG-containing 5’ UTRs. Integrating these features, our model achieved high accuracy (AUROC > 0.8) in predicting polysome-shifting mutations in the test dataset. Additionally, metagene analysis indicated that pathogenic variants were enriched at the upstream open reading frame (uORF) translation start site, suggesting changes in uORF usage underlie the translation deficiencies caused by these mutations. Illustrating this, we demonstrated that a pathogenic mutation in the IRF6 5’ UTR suppresses translation of the primary open reading frame by creating a uORF. Remarkably, site-directed ADAR editing of the mutant mRNA rescued this translation deficiency. Because the regulation of translation and stability does not converge, we illustrate these two mechanisms in two separate manuscripts (this one and doi.org/10.1101/2024.04.11.589132).

      Reviewer #3 (Public Review):

      Summary:

      In their manuscript titled "Multiplexed Assays of Human Disease‐relevant Mutations Reveal UTR Dinucleotide Composition as a Major Determinant of RNA Stability" the authors aim to investigate

      the effect of sequence variations in 3'UTR and 5'UTRs on the stability of mRNAs in two different human cell lines.

      To do so, the authors use a massively parallel reporter assay (MPRA). They transfect cells with a set of mRNA reporters that contain sequence variants in their 3' or 5' UTRs, which were previously reported in human diseases. They follow their clearance from cells over time relative to the matching non-variant sequence. To analyze their results, they define a set of factors (RBP and miRNA binding sites, sequence features, secondary structure etc.) and test their association with differences in mRNA stability. For features with a significant association, they use clustering to select a subset of factors for LASSO regression and identify factors that affect mRNA stability.

      They conclude that the TA dinucleotide content of UTRs is the strongest destabilizing sequence feature. Within that context, elevated GC content and protein binding can protect susceptible mRNAs from degradation. They also show that TA dinucleotide content of UTRs affects native mRNA stability, and that it is associated with specific functional groups. Finally, they link disease associated sequence variants with differences in mRNA stability of reporters.

      Strengths:

      (1) This work introduces a different MPRA approach to analyze the effect of genetic variants. While previous works in tissue culture use DNA transfections that require normalization for transcription efficiency, here the mRNA is directly introduced into cells at fixed amounts, allowing a more direct view of the mRNA regulation.

      (2) The authors also introduce a unique analysis approach, which takes into account multiple factors that might affect mRNA stability. This approach allows them to identify general sequence features that affect mRNA stability beyond specific genetic variants, and reach important insights on mRNA stability regulation. Indeed, while the conclusions to genetic variants identified in this work are interesting, the main strength of the work involve general effect of sequence features rather than specific variants.

      (3) The authors provide adequate supports for their claims, and validate their analysis using both their reporter data and native genes. For the main feature identified, TA di-nucleotides, they perform follow-up experiments with modified reporters that further strengthen their claims, and also validate the effect on native cellular transcripts (beyond reporters), demonstrating its validity also within native scenarios.

      (4) The work provides a broad analysis of mRNA stability, across two mRNA regulatory segments (3'UTR and 5'UTR) and is performed in two separate cell-types. Comparison between two different cell-types is adequate, and the results demonstrate, as expected, the dependence of mRNA stability on the cellular context. Analysis of 3'UTR and 5'UTR regulatory effects also shows interesting differences and similarities between these two regulatory regions.

      Weaknesses:

      (1) The authors fail to acknowledge several possible confounding factors of their MPRA approach in the discussion.

      First, while transfection of mRNA directly into cells allows to avoid the need to normalize for differences in transcription, the introduction of naked mRNA molecules is different than native cellular mRNAs and could introduce biases due to differences in mRNA modifications, protein associations etc. that may occur co-transcriptionally.

      Second, along those lines, the authors also use in-vitro polyadenylation. The length of the polyA tail of the transfected transcripts could potentially be very different than that of native mRNAs and also affect stability.

      The transcripts used in our study were polyadenylated in vitro with approximately 100 nucleotides  (Fig. S1C), similar to the polyA tail lengths typically observed in vivo  (dx.doi.org/10.1016/j.molcel.2014.02.007).  Additionally, these transcripts were capped to emulate essential mRNA characteristics and to minimize immune responses in recipient cells. This design allows us to study RNA decay for in vitro-synthesized RNA delivered into human cells, akin to RNA vaccines, but it does not necessarily extend to endogenous RNAs. As mentioned, endogenous RNAs undergo nuclear processing and are decorated by numerous trans factors, resulting in distinct regulatory mechanisms. We will provide a more in-depth discussion on these differences and their implications in the revised manuscript.

      (2) The analysis approach used in this work for identifying regulatory features in UTRs was not previously used. As such, lack of in-depth details of the methodology, and possibly also more general validation of the approach, is a drawback in convincing the reader in the validity of this approach and its results.

      In particular, a main point that is not addressed is how the authors decide on the set of "factors" used in their analysis? As choosing different sets of factors might affect the results of the analysis.

      In our study, we employed the calculation of the Variance Inflation Factor (VIF) as a basis for selecting variables. This well-established method is widely used to detect variables with high collinearity, thus ensuring the robustness and reliability of our analysis. By identifying and excluding highly collinear variables, we aimed to minimize multicollinearity and improve the accuracy of our regression models. For more detailed information on the use of VIF in regression analysis, please refer to Akinwande, M., Dikko, H., and Samson, A. (2015). Variance Inflation Factor: As a Condition for the Inclusion of Suppressor Variable(s) in Regression Analysis. Open Journal of Statistics, 5, 754-767. doi: 10.4236/ojs.2015.57075. We will include the method details in the revised manuscript.

      For example, the choice to use 7-mer sequences within the factors set is not explained, particularly when almost all motifs that are eventually identified (Figure 3B-E) are shorter.

      The known RBP motifs are primarily 6-mer. To explore the possibility of discovering novel motifs that could significantly impact our model, we started with 7-mer sequences. However, our analysis revealed that including these additional variables did not improve the explanatory power of the model; instead, it reduced it. Consequently, our final model focuses on motifs shorter than 7-mer. We will explain the motif selections in the revised manuscript.

      In addition, the authors do not perform validations to demonstrate the validity of their approach on simulated data or well-established control datasets. Such analysis would be helpful to further convince the reader in the usefulness and robustness of the analysis.

      We acknowledge the importance of validating our approach on simulated data or well-established control datasets to demonstrate its robustness and reliability. However, to the best of our knowledge, there are currently no well-established control datasets available that perfectly correspond to our specific study context. Despite this, we will continue to search for any relevant datasets that could be utilized for this purpose in future work. This effort will help to further reinforce the confidence in our methodology and its findings.

      (3) The analysis and regression models built in this work are not thoroughly investigated relative to native genes within cells. The effect of sequence "factors" on native cellular transcripts' stability is not investigated beyond TA di-nucleotides, and it is unclear to what degree do other predicted factors also affect native transcripts.

      Our system studies the stability control of RNA synthesized in vitro and delivered into human cells. While we validated the UTR TA-dinucleotide effect in vivo, we did not intend to conclude that this is the most influential regulation for endogenous RNAs. It is known that endogenous RNAs undergo very different regulation. The most prominent factors controlling endogenous RNA stability are the density of splice junctions and the length of UTRs (doi.org/10.1186/s13059-022-02811-x; doi.org/10.1186/s12915-021-00949-x). To decipher the sequence regulation, we controlled for these factors in our experiments. Therefore, we acknowledge that several endogenous features, which were excluded by our approach, may serve as predictive features of RNA half-life in vivo.

    1. Author response:

      Reviewer 1:

      Summary:

      In this manuscript by Bimbard et al., a new method to perform stable recordings over long periods of time with neuropixels, as well as the technical details on how the electrodes can be explanted for follow-up reuse, is provided. I think the description of all parts of the method is very clear, and the validation analyses (n of units per day over time, RMS over recording days...) are very convincing. I however missed a stronger emphasis on why this could provide a big impact on the ephys community, by enabling new analyses, new behavior correlation studies, or neurophysiological mechanisms across temporal scales

      Strengths:

      Open source method. Validation across laboratories. Across species (mice and rats) demonstration of its use and in different behavioral conditions (head-fixed and freely moving).

      Weaknesses:

      Weak emphasis on what can be enabled with this new method that didn't exist before.

      We thank the reviewer for highlighting the limited discussion around scientific impact. Our implant has several advantages which combine to make it much more accessible than previous solutions. This enables a variety of recording configurations that would not have been possible with previous designs, facilitating recordings from a wider range of brain regions, animals, and experimental setups. In short, there are three key advances:

      (1) Adaptability: The CAD files can be readily adapted to a wide range of configurations (implantation depth, angle, position of headstage, etc.). Labs have already, modified the design to optimise for their needs, and re-shared with the community.

      (2) Weight:  Because of the lightweight design, experimenters can i) perform complex and demanding freely moving tasks as we exemplify in the manuscript, and ii) implant female and water restricted mice while respecting animal welfare weight limitations.

      (3) Cost: At ~$10, our implant is significantly cheaper than published alternatives, which makes it affordable to more labs and means that testing modifications is cost-effective.

      We will make these features clearer in the manuscript.

      Reviewer 2:

      Summary:

      This work by Bimbard et al., introduces a new implant for Neuropixels probes. While Neuropixels probes have critically improved and extended our ability to record the activity of a large number of neurons with high temporal resolution, the use of these expensive devices in chronic experiments has so far been hampered by the difficulty of safely implanting them and, importantly, to explant and reuse them after conclusion of the experiment. The authors present a newly designed two-part implant, consisting of a docking and a payload module, that allows for secure implantation and straightforward recovery of the probes. The implant is lightweight, making it amenable for use in mice and rats, and customizable. The authors provide schematics and files for printing of the implants, which can be easily modified and adapted to custom experiments by researchers with little to no design experience. Importantly, the authors demonstrate the successful use of this implant across multiple use cases, in head-fixed and freely moving experiments, in mice and rats, with different versions of Neuropixels probes, and across 8 different labs. Taken together, the presented implants promise to make chronic Neuropixel recordings and long-term studies of neuronal activity significantly easier and attainable for both current and future Neuropixels users.

      Strengths:

      - The implants have been successfully tested across 8 different laboratories, in mice and rats, in head-fixed and freely moving conditions, and have been adapted in multiple ways for a number of distinct experiments.

      - Implants are easily customizable and the authors provide a straightforward approach for customization across multiple design dimensions even for researchers not experienced in design.

      - The authors provide clear and straightforward descriptions of the construction, implantation, and explant of the described implants.

      - The split of the implant into a docking and payload module makes reuse even in different experiments (using different docking modules) easy.

      - The authors demonstrate that implants can be re-used multiple times and still allow for high-quality recordings.

      - The authors show that the chronic implantations allow for the tracking of individual neurons across days and weeks (using additional software tracking solutions), which is critical for a large number of experiments requiring the description of neuronal activity, e.g. throughout learning processes.

      - The authors show that implanted animals can even perform complex behavioral tasks, with no apparent reduction in their performance.

      Weaknesses:

      - While implanted animals can still perform complex behavioral tasks, the authors describe that the implants may reduce the animals' mobility, as measured by prolonged reaction times. However, the presented data does not allow us to judge whether this effect is specifically due to the presented implant or whether any implant or just tethering of the animals per se would have the same effects.

      The reviewer is correct: some of the differences in mouse reaction time could be due to the tether rather than the implant. As these experiments were also performed in water-restricted female mice with the heavier Neuropixels 1.0 implant, our data represent the maximal impact of the implant, and we will highlight this in the revision.

      - While the authors make certain comparisons to other, previously published approaches for chronic implantation and re-use of Neuropixels probes, it is hard to make conclusive comparisons and judge the advantages of the current implant. For example, while the authors emphasize that the lower weight of their implant allows them to perform recordings in mice (and is surely advantageous), the previously described, heavier implants they mention (Steinmetz et al., 2021; van Daal et al., 2021), have also been used in mice. Whether the weight difference makes a difference in practice therefore remains somewhat unclear.

      The reviewer is correct: without a direct comparison, we cannot be certain that our smaller, lighter implant improves behavioural results (although this is supported by the literature, e.g. Newman et al, 2023). However, the reduced weight of our implant is critical for several laboratories represented in this manuscript due to animal welfare requirements. Indeed, in Daal et al the authors “recommend a [mouse] weight of >25 g for implanting Neuropixels 1.0 probes.” This limit precludes using (the vast majority of) female mice, or water-restricted animals. Conversely, our implant can be routinely used with lighter, water-restricted male and female mice. We will emphasise this point in the revision.

      - The non-permanent integration of the headstages into the implant, while allowing for the use of the same headstage for multiple animals in parallel, requires repeated connections and does not provide strong protection for the implant. This may especially be an issue for the use in rats, requiring additional protective components as in the presented rat experiments.

      We apologise for not clarifying the various headstage options in the manuscript and we will address this in the revision. Our repository has headplate holder designs (in the XtraModifications/Mouse_FreelyMoving folder). This allows leaving the headstage on the implant, and thus minimize the number of connections (albeit increasing the weight for the mouse). Indeed, mice recorded while performing the task described in our manuscript had the head-stage semi-permanently integrated to the implant, and we will highlight this in the revision.

      Reviewer 3:

      Summary:

      In this manuscript, Bimbard and colleagues describe a new implant apparatus called "Apollo Implant", which should facilitate recording in freely moving rodents (mice and rats) using Neuropixels probes. The authors collected data from both mice and rats, they used 3 different versions of Neuropixels, multiple labs have already adopted this method, which is impressive. They openly share their CAD designs and surgery protocol to further facilitate the adaptation of their method.

      Strengths:

      Overall, the "Apollo Implant" is easy to use and adapt, as it has been used in other laboratories successfully and custom modifications are already available. The device is reproducible using common 3D printing services and can be easily modified thanks to its CAD design (the video explaining this is extremely helpful). The weight and price are amazing compared to other systems for rigid silicon probes allowing a wide range of use of the "Apollo Implant".

      Weaknesses:

      The "Apollo Implant" can only handle Neuropixels probes. It cannot hold other widely used and commercially available silicon probes. Certain angles and distances are not possible in their current form (distance between probes 1.8 to 4mm, implantation depth 2-6.5 mm, or angle of insertion up to 20 degrees).

      We appreciate the reviewer’s points, but as we will discuss in the revised manuscript, one implant accommodating the diversity of the existing probes is beyond the scope of this project. However, because the design is adaptable, groups should be able to modify the current version of the implant to adapt to their electrodes’ size and format (and can highlight any issues in the Github “Discussions” area).

      With Neuropixels, the current range of depths covers practically all trajectories in the mouse brain. In rats, where deeper penetrations may be useful, the experimenter can attach the probe at a lower point in the payload module to increase the length of exposed shank. We now specify this in the Github repository.

      We have now extended the range of inter-probe distances from a maximum of 4 mm to 6.5 mm, and this will be reflected in the revised manuscript. Distances beyond this may be better served by 2 implants, and smaller distances could be achieved by attaching two probes on the same side of the docking module. In the next revision, we will add these points to the discussion.